Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models
IntermediateShiting Huang, Zecheng Li et al.Feb 10arXiv
The paper teaches large language models to do what good students do: find where they went wrong, turn that lesson into a rule, and remember it for next time.
#Reinforcement Learning with Verifiable Rewards#RLVR#Meta-Experience Learning