Papers2

#Sparse Autoencoder

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

Panagiotis Koromilas, Andreas D. Demou et al.Feb 1arXiv

PolySAE is a new kind of sparse autoencoder that keeps a simple, linear way to find features but uses a smarter decoder that can multiply features together.

#Sparse Autoencoder#Polynomial Decoder#Feature Interactions

Not triaged yet

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Intermediate

Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry et al.Jan 13arXiv

This paper introduces YaPO, a way to gently nudge a language model’s hidden thoughts so it behaves better without retraining it.

#Activation Steering#Sparse Autoencoder#Preference Optimization

Not triaged yet