TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents
IntermediateHang Yan, Xinyu Che et al.Feb 2arXiv
This paper studies how AI agents get better while they are working, not just whether they finish the job.
#Test-Time Improvement#LLM agents#trajectory analysis