Papers14

#world models

Next Embedding Prediction Makes World Models Stronger

George Bredis, Nikita Balagansky et al.Mar 3arXiv

NE-Dreamer is a model-based reinforcement learning agent that skips rebuilding pixels and instead learns by predicting the next step’s hidden features.

#model-based reinforcement learning#world models#next-embedding prediction

Not triaged yet

Multi-agent cooperation through in-context co-player inference

Intermediate

Marissa A. Weis, Maciej Wołczyk et al.Feb 18arXiv

The paper shows that AI agents can learn to cooperate simply by playing lots of different kinds of opponents and figuring them out on the fly, without hardcoding how those opponents learn.

#multi-agent reinforcement learning#in-context learning#co-player inference

Not triaged yet

MIND: Benchmarking Memory Consistency and Action Control in World Models

Intermediate

Yixuan Ye, Xuanyu Lu et al.Feb 8arXiv

MIND is a new benchmark that fairly tests two core skills of world models: remembering the world over time (memory consistency) and following controls exactly (action control).

#world models#memory consistency#action control

Not triaged yet

An Empirical Study of World Model Quantization

Intermediate

Zhongqian Fu, Tianyi Zhao et al.Feb 2arXiv

World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.

#world models#post-training quantization#DINO-WM

Not triaged yet

Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention

Intermediate

Dvir Samuel, Issar Tzachor et al.Feb 2arXiv

The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.

#autoregressive video diffusion#KV cache compression#sparse attention

Not triaged yet

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Intermediate

Bohan Zeng, Kaixin Zhu et al.Feb 2arXiv

This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.

#world models#unified framework#multimodal reasoning

Not triaged yet

Self-Refining Video Sampling

Intermediate

Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

Not triaged yet

A Mechanistic View on Video Generation as World Models: State and Dynamics

Intermediate

Luozhou Wang, Zhifei Chen et al.Jan 22arXiv

This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.

#world models#video generation#state representation

Not triaged yet

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Intermediate

Youwei Liu, Jian Wang et al.Jan 13arXiv

Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.

#Imagine-then-Plan#world models#adaptive lookahead

Not triaged yet

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Intermediate

Yang Zhou, Hao Shao et al.Jan 4arXiv

DrivingGen is a new, all-in-one test that fairly checks how well AI can imagine future driving videos and motions.

#generative video#autonomous driving#world models

Not triaged yet

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

Intermediate

Rong Zhou, Dongping Chen et al.Jan 4arXiv

A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.

#digital twin#physics-informed AI#neural operators

Not triaged yet

From Word to World: Can Large Language Models be Implicit Text-based World Models?

Intermediate

Yixia Li, Hongru Wang et al.Dec 21arXiv

This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.

#world models#next-state prediction#text-based environments

Not triaged yet

1 2