The paper teaches language models to explore more ideas while thinking, so they can solve harder problems.
JustRL shows that a tiny, steady recipe for reinforcement learning (RL) can make a 1.5B-parameter language model much better at math without fancy tricks.