Linear representations in language models can change dramatically over a conversation
IntermediateAndrew Kyle Lampinen, Yuxuan Li et al.Jan 28arXiv
Language models store ideas along straight-line directions inside their brains (representations), like sliders for βtruthβ or βethics.β
#linear representations#factuality#ethics