Papers3

#Agentic Reinforcement Learning

This paper tackles why training AI agents that act over many steps (like browsing the web or moving in a house) often becomes unstable and collapses.

Not triaged yet

The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.

Not triaged yet

AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.

Not triaged yet