LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.
This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.