Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification
BeginnerYiju Guo, Tianyi Hu et al.Jan 29arXiv
This paper shows that many reasoning failures in AI are caused by just a few distracting words in the prompt, not because the problems are too hard.
#LENS#Interference Tokens#Reinforcement Learning with Verifiable Rewards