🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers10

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#video generation

Self-Refining Video Sampling

Intermediate
Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

SkyReels-V3 Technique Report

Intermediate
Debang Li, Zhengcong Fei et al.Jan 24arXiv

SkyReels-V3 is a single AI model that can make videos in three ways: from reference images, by extending an existing video, and by creating talking avatars from audio.

#video generation#diffusion transformer#multimodal in-context learning

A Mechanistic View on Video Generation as World Models: State and Dynamics

Intermediate
Luozhou Wang, Zhifei Chen et al.Jan 22arXiv

This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.

#world models#video generation#state representation

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Intermediate
Jianhao Yuan, Xiaofeng Zhang et al.Jan 15arXiv

This paper teaches video-making AIs to follow real-world physics better without retraining them.

#video generation#physics plausibility#latent world model

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Intermediate
Xu Guo, Fulong Ye et al.Jan 4arXiv

DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.

#video face swapping#image face swapping#diffusion transformer

SVBench: Evaluation of Video Generation Models on Social Reasoning

Beginner
Wenshuo Peng, Gongxuan Wang et al.Dec 25arXiv

SVBench is the first benchmark that checks whether video generation models can show realistic social behavior, not just pretty pictures.

#social reasoning#video generation#benchmark

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Intermediate
Jiawei Liu, Junqiao Li et al.Dec 24arXiv

DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.

#arbitrary frame conditioning#one-shot video generation#Diffusion Transformer

Kling-Omni Technical Report

Intermediate
Kling Team, Jialu Chen et al.Dec 18arXiv

Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.

#multimodal visual language#MVL#prompt enhancer

Spatia: Video Generation with Updatable Spatial Memory

Intermediate
Jinjing Zhao, Fangyun Wei et al.Dec 17arXiv

Spatia is a video generator that keeps a live 3D map of the scene (a point cloud) as its memory while making videos.

#video generation#spatial memory#3D point cloud

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate
Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model