AdaptMMBench is a new test that checks if AI models know when to just look and think, and when to use extra visual tools like zooming or brightening an image.
AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.
Long texts are expensive for AI to read because each extra token costs a lot of compute and memory.
This paper teaches a vision-language model to think about images by talking to copies of itself, using only words to plan and decide.