E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
IntermediateShengjun Zhang, Zhang Zhang et al.Jan 1arXiv
This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.
#E-GRPO#Group Relative Policy Optimization#Flow Matching