Large language models can act unpredictably in sensitive places like schools, hospitals, and customer support, so we need reliable ways to guide how they talk and behave.
The paper shows that three popular ways to control language models—fine-tuning a few weights, LoRA, and activation steering—are actually the same kind of action: a dynamic weight update driven by a control knob.