Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
IntermediateHengyuan Zhang, Zhihao Zhang et al.Jan 20arXiv
This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.
#mechanistic interpretability#residual stream#attention heads