This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.
The paper turns video avatars from passive puppets into active doers that can plan, act, check their own work, and fix mistakes over many steps.