iGRPO: Self-Feedback-Driven LLM Reasoning
BeginnerAli Hatamizadeh, Shrimai Prabhumoye et al.Feb 9arXiv
This paper teaches a language model to improve its own math answers by first writing several drafts and then learning to beat its best draft.
#iGRPO#GRPO#Reinforcement Learning