PolySAE is a new kind of sparse autoencoder that keeps a simple, linear way to find features but uses a smarter decoder that can multiply features together.
This paper introduces YaPO, a way to gently nudge a language modelβs hidden thoughts so it behaves better without retraining it.