🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers9

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#video diffusion transformer

DreamWorld: Unified World Modeling in Video Generation

Intermediate
Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

Solaris: Building a Multiplayer Video World Model in Minecraft

Intermediate
Georgy Savva, Oscar Michel et al.Feb 25arXiv

Solaris is a new AI that can imagine the future videos of two Minecraft players at the same time, keeping both cameras consistent with each other.

#multiplayer world model#video diffusion transformer#Minecraft dataset

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Intermediate
Linxi Xie, Lisong C. Sun et al.Feb 20arXiv

This paper builds a "generated reality" system that lets AI-made videos react to your real head and hand movements in VR.

#generated reality#hand pose conditioning#video diffusion transformer

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Intermediate
FSVideo Team, Qingyu Chen et al.Feb 2arXiv

FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.

#FSVideo#image-to-video#video diffusion transformer

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Intermediate
Tongcheng Fang, Hanling Zhang et al.Jan 23arXiv

Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.

#SALAD#sparse attention#linear attention

Plenoptic Video Generation

Intermediate
Xiao Fu, Shitao Tang et al.Jan 8arXiv

PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.

#plenoptic function#camera-controlled video generation#video re-rendering

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Intermediate
Guibao Shen, Yihua Du et al.Dec 18arXiv

StereoPilot is a new AI that turns regular 2D videos into 3D (stereo) videos quickly and with high quality.

#stereo video conversion#monocular-to-stereo#depth ambiguity

Towards Interactive Intelligence for Digital Humans

Intermediate
Yiyi Cai, Xuangeng Chu et al.Dec 15arXiv

Digital humans used to just copy motions; this paper makes them think, speak, and move in sync like real people.

#interactive intelligence#digital human#multimodal avatar

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Intermediate
Ye Fang, Tong Wu et al.Dec 12arXiv

V-RGBX is a new video editing system that lets you change the true building blocks of a scene—like base color, surface bumps, material, and lighting—rather than just painting over pixels.

#intrinsic video editing#inverse rendering#forward rendering