Multi-agent systems are like teams of smart helpers, but one bad message can mislead the whole team.
This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?