πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#autoregressive diffusion

Helios: Real Real-Time Long Video Generation Model

Intermediate
Shenghai Yuan, Yuanyang Yin et al.Mar 4arXiv

Helios is a 14-billion-parameter video model that can make minute-long videos in real time at about 19.5 frames per second on a single NVIDIA H100 GPU.

#real-time video generation#long video diffusion#autoregressive diffusion

MIND: Benchmarking Memory Consistency and Action Control in World Models

Intermediate
Yixuan Ye, Xuanyu Lu et al.Feb 8arXiv

MIND is a new benchmark that fairly tests two core skills of world models: remembering the world over time (memory consistency) and following controls exactly (action control).

#world models#memory consistency#action control

Geometry-Aware Rotary Position Embedding for Consistent Video World Model

Intermediate
Chendong Xiang, Jiajun Liu et al.Feb 8arXiv

The paper fixes a common problem in video world models: scenes slowly change or β€œdrift” when the camera moves and comes back.

#ViewRope#geometry-aware attention#rotary position embedding

Causal World Modeling for Robot Control

Intermediate
Lin Li, Qihang Zhang et al.Jan 29arXiv

Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.

#robot world model#autoregressive diffusion#causal masking

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Intermediate
Sihui Ji, Xi Chen et al.Dec 16arXiv

MemFlow is a new way for AI to remember the right parts of a long video story while it keeps making new parts, so characters and scenes stay consistent.

#MemFlow#Narrative Adaptive Memory#Sparse Memory Activation

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Intermediate
Jianxiong Gao, Zhaoxi Chen et al.Dec 15arXiv

LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.

#long video generation#world model#multimodal control