The paper asks a simple question: what must a vision modelβs internal pictures (embeddings) look like if it can recognize new mixes of things it already knows?
RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'