Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics
IntermediateZiwen Xu, Chenyan Wu et al.Feb 2arXiv
The paper shows that three popular ways to control language models—fine-tuning a few weights, LoRA, and activation steering—are actually the same kind of action: a dynamic weight update driven by a control knob.
#language model steering#dynamic weight updates#activation steering