This paper teaches AI to pay attention better by training its focus, not just its words.
BatCoder teaches a code model to write both code and its documentation by doing a round trip: from code to docs and back to code.
This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.
JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.
The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.