This paper says we should measure an AI agentβs uncertainty across its whole conversation, not just on one final answer.
The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.