🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers12

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#world model

DreamWorld: Unified World Modeling in Video Generation

Intermediate
Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

The Trinity of Consistency as a Defining Principle for General World Models

Intermediate
Jingxuan Wei, Siyuan Li et al.Feb 26arXiv

The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).

#world model#trinity of consistency#modal consistency

Computer-Using World Model

Intermediate
Yiming Guan, Rui Yu et al.Feb 19arXiv

The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.

#world model#GUI agent#desktop automation

World Models for Policy Refinement in StarCraft II

Intermediate
Yixin Zhang, Ziyi Wang et al.Feb 16arXiv

The paper builds StarWM, a ‘world model’ that lets a StarCraft II agent imagine what will happen a few seconds after it takes an action.

#world model#action-conditioned dynamics#StarCraft II

Generative Visual Code Mobile World Models

Intermediate
Woosung Koh, Sungjun Han et al.Feb 2arXiv

This paper shows a new way to predict what a phone screen will look like after you tap or scroll: generate web code (like HTML/CSS/SVG) and then render it to pixels.

#mobile GUI#world model#vision-language model

Advancing Open-source World Models

Intermediate
Robbyant Team, Zelin Gao et al.Jan 28arXiv

LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.

#world model#video diffusion#causal attention

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Intermediate
Moo Jin Kim, Yihuai Gao et al.Jan 22arXiv

Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.

#video diffusion#robot policy learning#visuomotor control

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

Intermediate
Hansen Jin Lillemark, Benhao Huang et al.Jan 3arXiv

This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.

#flow equivariance#world model#partially observed environments

Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Intermediate
Pierre Colombo, Malik Boudiaf et al.Dec 21arXiv

Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.

#capitalization tie-out#dataroom#cap table verification

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Intermediate
Jianxiong Gao, Zhaoxi Chen et al.Dec 15arXiv

LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.

#long video generation#world model#multimodal control

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate
Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model

MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

Intermediate
Ruicheng Zhang, Mingyang Zhang et al.Dec 7arXiv

Robots need lots of realistic, long videos to learn, but collecting them is slow and expensive.

#hierarchical video generation#robotic manipulation#long-horizon planning