Beyond Correctness: Learning Robust Reasoning via Transfer
IntermediateHyunseok Lee, Soheil Abbasloo et al.Feb 9arXiv
This paper teaches language models not just to get the final answer right but to think in a way others can reliably follow.
#Reinforcement Learning with Transferable Reward#RLTR#Reasoning Transferability