The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.
Machine learning agents usually improve by writing code, running it for hours, and then using the results to tweak the next try, which is very slow.