The paper introduces UnifiedReward-Flex, a reward model that judges images and videos the way a thoughtful human wouldโby flexibly changing what it checks based on the prompt and the visual evidence.
WorldCanvas lets you make videos where things happen exactly how you ask by combining three inputs: text (what happens), drawn paths called trajectories (when and where it happens), and reference images (who it is).