This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.
This paper asks a simple question with big consequences: can today’s AI models actively explore a new space and build a trustworthy internal map of it?