On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
BeginnerCharlie Zhang, Graham Neubig et al.Dec 8arXiv
The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.
#edge of competence#process-verified evaluation#process-level rewards