This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.
Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.