PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.