Papers5

#multimodal agents

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Zhaochen Su, Jincheng Gao et al.Feb 26arXiv

AgentVista is a new test (benchmark) that checks whether AI agents can solve tough, real-life picture-based problems by using multiple tools over many steps.

#AgentVista#multimodal agents#visual grounding

Not triaged yet

MMA: Multimodal Memory Agent

Intermediate

Yihao Lu, Wanru Cheng et al.Feb 18arXiv

Long-horizon AI assistants can grab old, low-quality, or conflicting memories and then answer with too much confidence, which is dangerous.

#memory-augmented LLMs#multimodal agents#reliability scoring

Not triaged yet

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Intermediate

Wayne Chi, Yixiong Fang et al.Feb 11arXiv

GameDevBench is a new test that checks if AI agents can actually make parts of video games, not just write code in one file.

#GameDevBench#Godot#multimodal agents

Not triaged yet

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Intermediate

Chenlong Deng, Mengjie Deng et al.Feb 11arXiv

Most image search systems judge each photo by itself, which fails when clues are split across many photos taken over time.

#context-aware image retrieval#multimodal agents#visual history exploration

Not triaged yet

Video-Browser: Towards Agentic Open-web Video Browsing

Beginner

Zhengyang Liang, Yan Shu et al.Dec 28arXiv

The paper tackles how AI agents can truly research the open web when the answers are hidden inside long, messy videos, not just text.

#agentic video browsing#pyramidal perception#video understanding

Not triaged yet