AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
IntermediateZhaochen Su, Jincheng Gao et al.Feb 26arXiv
AgentVista is a new test (benchmark) that checks whether AI agents can solve tough, real-life picture-based problems by using multiple tools over many steps.
#AgentVista#multimodal agents#visual grounding