Papers2

All Beginner Intermediate Advanced

All Sources arXiv

#orchestrator

General Agent Evaluation

Intermediate

Elron Bandel, Asaf Yehudai et al.Feb 26arXiv

This paper shows how to fairly test "general-purpose" AI agents that should work in many places without special tweaks.

#general-purpose agents#agent evaluation#unified protocol

Not triaged yet

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Intermediate

Bowen Yang, Kaiming Jin et al.Jan 12arXiv

Computer-using agents kept forgetting important visual details over long tasks and could not reliably find up-to-date, step-by-step help for unfamiliar apps.

#computer-using agents#vision-language models#milestone memory

Not triaged yet