This paper teaches a model to be its own teacher so it can climb out of a learning plateau on very hard math problems.
Reasoning Palette gives a language or vision-language model a tiny hidden โmoodโ (a latent code) before it starts answering, so it chooses a smarter plan rather than just rolling dice on each next word.
This paper teaches AI models to reason better by first copying only good examples and later learning from mistakes too.