World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.
The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.
This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.
Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.
DrivingGen is a new, all-in-one test that fairly checks how well AI can imagine future driving videos and motions.
A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.
This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.
WorldCanvas lets you make videos where things happen exactly how you ask by combining three inputs: text (what happens), drawn paths called trajectories (when and where it happens), and reference images (who it is).
AniX is a system that lets you place any character into any 3D world and control them with plain language, like “run forward” or “play a guitar.”