This paper says that to make math-solving AIs smarter, we should train them more on the hardest questions they can almost solve.
Reward models are like scorekeepers that tell AI which answers people like more, and this paper builds the first big test for scorekeepers that judge both pictures and words together.