Shaping capabilities with token-level data filtering
IntermediateNeil Rathi, Alec RadfordJan 29arXiv
The paper shows a simple way to teach AI models what not to learn by removing only the exact words (tokens) related to unwanted topics during pretraining.
#token-level data filtering#capability shaping#sparse autoencoders