DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.
The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).
The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.
The paper builds StarWM, a ‘world model’ that lets a StarCraft II agent imagine what will happen a few seconds after it takes an action.
This paper shows a new way to predict what a phone screen will look like after you tap or scroll: generate web code (like HTML/CSS/SVG) and then render it to pixels.
LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.
Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.
This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.
Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.
LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.
Robots need lots of realistic, long videos to learn, but collecting them is slow and expensive.