Text-to-image models using GRPO used to give the same final reward to every step, which is like giving the whole team the same grade no matter who did what.
This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.