X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.
Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.
This paper says long chain-of-thought (Long CoT) works best when it follows a 'molecular' pattern with three kinds of thinking bonds: Deep-Reasoning, Self-Reflection, and Self-Exploration.
EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
TourPlanner is a travel-planning system that first gathers the right places, then lets multiple expert ‘voices’ debate plans, and finally polishes the winner with a learning method that follows rules before style.
Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.
The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.
Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.
Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.