RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'
This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.