The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.
This paper studies how AI agents get better while they are working, not just whether they finish the job.
The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.