Selective Steering is a new way to gently nudge a language model’s inner thoughts without breaking its flow or skills.
This paper introduces YaPO, a way to gently nudge a language model’s hidden thoughts so it behaves better without retraining it.