Humans often make guesses about the world that are likely but not certain, and this paper studies how humans and AI compare at doing that.
This paper teaches AI agents to make smart choices about when to explore for more information and when to act right away.
The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.