This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
MARS is an AI agent that runs AI research like a careful scientist and thrifty engineer at the same time.
This paper teaches AI teams to get better by scoring every move they make, not just the final answer.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.
Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.
The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.
This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.
This paper teaches a computer agent to grow a toolbox of skills that are real, runnable programs, not just text ideas.
Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.
TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.