Text-to-image models using GRPO used to give the same final reward to every step, which is like giving the whole team the same grade no matter who did what.
The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.