Papers12

#world model

DreamWorld: Unified World Modeling in Video Generation

Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

Not triaged yet

The Trinity of Consistency as a Defining Principle for General World Models

Intermediate

Jingxuan Wei, Siyuan Li et al.Feb 26arXiv

The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).

#world model#trinity of consistency#modal consistency

Not triaged yet

Computer-Using World Model

Intermediate

Yiming Guan, Rui Yu et al.Feb 19arXiv

The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.

#world model#GUI agent#desktop automation

Not triaged yet

World Models for Policy Refinement in StarCraft II

Intermediate

Yixin Zhang, Ziyi Wang et al.Feb 16arXiv

The paper builds StarWM, a ‘world model’ that lets a StarCraft II agent imagine what will happen a few seconds after it takes an action.

#world model#action-conditioned dynamics#StarCraft II

Not triaged yet

Generative Visual Code Mobile World Models

Intermediate

Woosung Koh, Sungjun Han et al.Feb 2arXiv

This paper shows a new way to predict what a phone screen will look like after you tap or scroll: generate web code (like HTML/CSS/SVG) and then render it to pixels.

#mobile GUI#world model#vision-language model

Not triaged yet

Advancing Open-source World Models

Intermediate

Robbyant Team, Zelin Gao et al.Jan 28arXiv

LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.

#world model#video diffusion#causal attention

Not triaged yet

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Intermediate

Moo Jin Kim, Yihuai Gao et al.Jan 22arXiv

Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.

#video diffusion#robot policy learning#visuomotor control

Not triaged yet

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

Intermediate

Hansen Jin Lillemark, Benhao Huang et al.Jan 3arXiv

This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.

#flow equivariance#world model#partially observed environments

Not triaged yet

Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Intermediate

Pierre Colombo, Malik Boudiaf et al.Dec 21arXiv

Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.

#capitalization tie-out#dataroom#cap table verification

Not triaged yet

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Intermediate

Jianxiong Gao, Zhaoxi Chen et al.Dec 15arXiv

LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.

#long video generation#world model#multimodal control

Not triaged yet

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate

Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model

Not triaged yet

MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

Intermediate

Ruicheng Zhang, Mingyang Zhang et al.Dec 7arXiv

Robots need lots of realistic, long videos to learn, but collecting them is slow and expensive.

#hierarchical video generation#robotic manipulation#long-horizon planning

Not triaged yet