The paper argues that the fairest way to check how generally smart an AI is, is to see how quickly and well it learns lots of different human-made games, just like a person with the same time and practice.
This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.
This paper turns rebuttal writing from ‘just write some text’ into ‘make a plan with proof, then write.’