MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.
This paper teaches a model to turn a question about a table into both a short answer and a clear, correct chart.
This paper teaches AI agents to learn new reusable skills and get better over time by using reinforcement learning, not just prompts.