Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.
This paper introduces MMDeepResearch-Bench (MMDR-Bench), a new test that checks how well AI “deep research agents” write long, citation-rich reports using both text and images.