How I Study AI - Learn AI Papers & Lectures the Easy Way

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Intermediate

Anton Korznikov, Andrey Galichin et al.Feb 15arXiv

Sparse autoencoders (SAEs) are popular for explaining what large language models are doing, but this paper shows they often don’t learn real, meaningful features.

#sparse autoencoders#interpretability#dictionary learning

Papers1

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?