PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.
This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.
This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.
The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.