Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.
Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.
X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.
Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.
This paper says long chain-of-thought (Long CoT) works best when it follows a 'molecular' pattern with three kinds of thinking bonds: Deep-Reasoning, Self-Reflection, and Self-Exploration.
EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
TourPlanner is a travel-planning system that first gathers the right places, then lets multiple expert ‘voices’ debate plans, and finally polishes the winner with a learning method that follows rules before style.
Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.
The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.