The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.
This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.