From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks
IntermediateChangpeng Yang, Jinyang Wu et al.Dec 2arXiv
This paper teaches AI models to reason better by first copying only good examples and later learning from mistakes too.
#Curriculum Advantage Policy Optimization#advantage-based RL#imitation learning