🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers17

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#classifier-free guidance

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Intermediate
Zehong Ma, Ruihan Xu et al.Feb 2arXiv

PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.

#pixel diffusion#perceptual loss#LPIPS

Self-Refining Video Sampling

Intermediate
Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

Alterbute: Editing Intrinsic Attributes of Objects in Images

Intermediate
Tal Reiss, Daniel Winter et al.Jan 15arXiv

Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.

#intrinsic attribute editing#visual named entities#identity preservation

Future Optical Flow Prediction Improves Robot Control & Video Generation

Intermediate
Kanchana Ranasinghe, Honglu Zhou et al.Jan 15arXiv

FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.

#optical flow#future optical flow prediction#vision-language model

Transition Matching Distillation for Fast Video Generation

Intermediate
Weili Nie, Julius Berner et al.Jan 14arXiv

Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.

#video diffusion#distillation#transition matching

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Intermediate
Longbin Ji, Xiaoxiong Liu et al.Jan 9arXiv

VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.

#autoregressive video generation#visual autoregression#next-frame prediction

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Intermediate
John Page, Xuesong Niu et al.Jan 9arXiv

This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.

#Send-VAE#semantic disentanglement#latent diffusion

LTX-2: Efficient Joint Audio-Visual Foundation Model

Intermediate
Yoav HaCohen, Benny Brazowski et al.Jan 6arXiv

LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.

#text-to-video#text-to-audio#audiovisual generation

Over++: Generative Video Compositing for Layer Interaction Effects

Intermediate
Luchao Qi, Jiaye Wu et al.Dec 22arXiv

Over++ is a video AI that adds realistic effects like shadows, splashes, dust, and smoke between a foreground and a background without changing the original footage.

#augmented compositing#video diffusion#video inpainting

EasyV2V: A High-quality Instruction-based Video Editing Framework

Intermediate
Jinjie Mai, Chaoyang Wang et al.Dec 18arXiv

EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”

#instruction-based video editing#spatiotemporal mask#text-to-video fine-tuning

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Intermediate
Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement

Feedforward 3D Editing via Text-Steerable Image-to-3D

Intermediate
Ziqi Ma, Hongqiao Chen et al.Dec 15arXiv

Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.

#3D editing#image-to-3D#ControlNet
12