The paper introduces UnifiedReward-Flex, a reward model that judges images and videos the way a thoughtful human would—by flexibly changing what it checks based on the prompt and the visual evidence.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.
GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.