This paper builds a gigantic library of video puzzles (VBVR) so AI can practice not just making pretty videos, but actually thinking through what happens over time.
This paper shows that making short videos can help AI plan and reason in pictures better than writing out steps in text.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
AdaTooler-V teaches an image-and-video AI to first ask, “Do I really need a tool?” before using one, which saves time and boosts accuracy.
The paper shows that video AIs do not need long, human-like chains of thought to reason well.