The paper turns video avatars from passive puppets into active doers that can plan, act, check their own work, and fix mistakes over many steps.
SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.
Robots need lots of realistic, long videos to learn, but collecting them is slow and expensive.