PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
This paper teaches AI teams to get better by scoring every move they make, not just the final answer.
SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.
GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.
TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.