The paper asks a simple question: Which step-by-step explanations from a teacher model actually help a student model learn to reason better?
This paper is the first big map of how AI can fix real software problems, not just write short code snippets.
TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.
X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.
Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).
EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.
Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).
Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.
Supervised fine-tuning (SFT) often makes a model great at a new task but worse at its old skills; this paper explains a key reason why and how to fix it.
This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.
The paper teaches AI to write strong research plans by letting it grade its own work using checklists (rubrics) pulled from real scientific papers.
Robust-R1 teaches vision-language models to notice how a picture is damaged, think through what that damage hides, and then answer as if the picture were clear.