This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?