Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
IntermediateXin Xu, Clive Bai et al.Feb 12arXiv
This paper shows a simple way to turn many 'too-easy' questions into harder, still-checkable ones so that AI keeps learning instead of stalling.
#Reinforcement Learning with Verifiable Rewards#Compositional prompts#Sequential Prompt Composition