This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.
The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.