Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
IntermediateShobhita Sundaram, John Quan et al.Jan 26arXiv
This paper teaches a model to be its own teacher so it can climb out of a learning plateau on very hard math problems.
#meta-reinforcement learning#teacher-student self-play#grounded rewards