The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.
VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.
Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.
This survey explains how AI judges are changing from single smart readers (LLM-as-a-Judge) into full-on agents that can plan, use tools, remember, and work in teams (Agent-as-a-Judge).
Big reasoning AIs think in many steps, which is slow and costly.
Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).
Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.
The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.
SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.