The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.
This paper shows a simple way for AI models to keep learning new things without forgetting what they already know.
This paper shows that giving an AI a safe, tiny virtual computer (a sandbox) lets it solve many kinds of problems better, not just coding ones.
This paper explains how to turn large language models (LLMs) from quiet students that only answer questions into active agents that can plan, act, and learn over time.
MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.