Humans often make guesses about the world that are likely but not certain, and this paper studies how humans and AI compare at doing that.
The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.