RAPTOR: Ridge-Adaptive Logistic Probes
IntermediateZiqi Gao, Yaotian Zhu et al.Jan 29arXiv
RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'
#probing#concept vectors#activation steering