Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.
This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.