YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation
IntermediateAbdelaziz Bounhar, Rania Hossam Elmohamady Elbadry et al.Jan 13arXiv
This paper introduces YaPO, a way to gently nudge a language modelβs hidden thoughts so it behaves better without retraining it.
#Activation Steering#Sparse Autoencoder#Preference Optimization