The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
IntermediateChristina Lu, Jack Gallagher et al.Jan 15arXiv
Language models can act like many characters, but they usually aim to be a helpful Assistant after post-training.
#Assistant Axis#persona drift#activation capping