AdaptMMBench is a new test that checks if AI models know when to just look and think, and when to use extra visual tools like zooming or brightening an image.
AACR-Bench is a new test set that checks how well AI can do code reviews using the whole project, not just one file.
This paper introduces MMDeepResearch-Bench (MMDR-Bench), a new test that checks how well AI “deep research agents” write long, citation-rich reports using both text and images.