Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
IntermediateXiming Lu, David Acuna et al.Jan 30arXiv
Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.
#Reinforcement Learning with Verifiable Rewards#Golden Goose#GooseReason-0.7M