The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.
Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.