Papers4

All Beginner Intermediate Advanced

All Sources arXiv

#OCR

AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process

Intermediate

Xintong Zhang, Xiaowen Zhang et al.Feb 2arXiv

AdaptMMBench is a new test that checks if AI models know when to just look and think, and when to use extra visual tools like zooming or brightening an image.

#Adaptive Multimodal Reasoning#Vision-Language Models#Tool Invocation

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Intermediate

Hongbo Zhao, Meng Wang et al.Dec 17arXiv

Long texts are expensive for AI to read because each extra token costs a lot of compute and memory.

#vision‑text compression#VTCBench#vision‑language models

Thinking with Images via Self-Calling Agent

Intermediate

Wenxi Yang, Yuzhong Zhao et al.Dec 9arXiv

This paper teaches a vision-language model to think about images by talking to copies of itself, using only words to plan and decide.

#Self-Calling Chain-of-Thought#sCoT#interleaved multimodal chain-of-thought