Longer explanations are not always better; the shape of thinking matters.
The paper shows that when training reasoning AIs with reinforcement learning, treating every wrong answer the same makes the AI overconfident in some bad paths and less diverse overall.
This paper shows a simple way to turn many 'too-easy' questions into harder, still-checkable ones so that AI keeps learning instead of stalling.
The paper teaches large language models to do what good students do: find where they went wrong, turn that lesson into a rule, and remember it for next time.
Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.
The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.
LightOnOCR-2-1B is a single, compact AI model that reads PDF pages and scans and turns them into clean, well-ordered text without using fragile multi-step OCR pipelines.