One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling
BeginnerYiyuan Li, Zhen Huang et al.Jan 6arXiv
This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.
#polymath learning#one-shot reinforcement learning#GRPO