This paper shows that making short videos can help AI plan and reason in pictures better than writing out steps in text.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
AdaTooler-V teaches an image-and-video AI to first ask, “Do I really need a tool?” before using one, which saves time and boosts accuracy.
The paper shows that video AIs do not need long, human-like chains of thought to reason well.