General Agent Evaluation
IntermediateElron Bandel, Asaf Yehudai et al.Feb 26arXiv
This paper shows how to fairly test "general-purpose" AI agents that should work in many places without special tweaks.
#general-purpose agents#agent evaluation#unified protocol