Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
IntermediateShyam Sundhar Ramesh, Xiaotong Ji et al.Feb 5arXiv
Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.
#Multi-Task Learning#GRPO#Reinforcement Learning Post-Training