The paper tackles a common problem: people can ask AI to do big, complex tasks, but they canβt always explain exactly what they want or check the results well.
The paper shows a simple way to teach AI models what not to learn by removing only the exact words (tokens) related to unwanted topics during pretraining.