The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.
AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.