Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning
IntermediateZhipeng Chen, Xiaobo Qin et al.Jan 31arXiv
This paper teaches a model to make its own helpful hints (sub-questions) and then use those hints to learn better with reinforcement learning that checks answers automatically.
#RLVR#Large Reasoning Models#Sub-question Guidance