Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
IntermediateKishan Panaganti, Zhenwen Liang et al.Jan 27arXiv
LLMs are usually trained by treating every question the same and giving each one the same number of tries, which wastes compute on easy problems and neglects hard ones.
#LLM reasoning#Reinforcement Learning (RL)#GRPO