This paper studies how AI agents get better while they are working, not just whether they finish the job.
ToolPRMBench is a new benchmark that checks, step by step, whether an AI agent using tools picks the right next action.