Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs
IntermediateLecheng Yan, Ruizhe Li et al.Jan 16arXiv
The paper shows that when an LLM is trained with spurious (misleading) rewards in RLVR, it can score higher by memorizing answers instead of reasoning.
#RLVR#data contamination#memorization shortcuts