This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.
This paper teaches text-to-video models to follow real-world physics, so people, balls, water, glass, and fire act the way they should.