Papers25

#classifier-free guidance

MIBURI: Towards Expressive Interactive Gesture Synthesis

M. Hamza Mughal, Rishabh Dabral et al.Mar 3arXiv

MIBURI is a system that makes a talking digital character move its body and face expressively in real time while it speaks.

#co-speech gesture synthesis#embodied conversational agents#causal generation

DreamWorld: Unified World Modeling in Video Generation

Intermediate

Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Intermediate

Yasaman Haghighi, Alexandre AlahiFeb 27arXiv

SenCache speeds up video diffusion models by reusing past answers only when the model is predicted to change very little.

#diffusion models#video generation#caching

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Intermediate

Euisoo Jung, Byunghyun Kim et al.Feb 25arXiv

Diffusion models make great images but are slow because they fix noise step by step many times.

#diffusion inference#multi-GPU acceleration#data parallelism

SARAH: Spatially Aware Real-time Agentic Humans

Intermediate

Evonne Ng, Siwei Zhang et al.Feb 20arXiv

SARAH is a real-time system that makes virtual characters move their whole bodies naturally during a conversation while knowing where the user is.

#spatially aware motion#real-time avatars#causal transformer

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Intermediate

Hojung Jung, Rodrigo Hormazabal et al.Feb 19arXiv

MolHIT is a new AI that builds molecules as graphs, moving from broad chemical groups to exact atoms step by step.

#molecular graph generation#discrete diffusion#hierarchical diffusion

World Action Models are Zero-shot Policies

Intermediate

Seonghyeon Ye, Yunhao Ge et al.Feb 17arXiv

DreamZero is a robot brain that learns actions by predicting short videos of the future and the matching moves at the same time.

#World Action Models#DreamZero#video diffusion

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Intermediate

Xu Guo, Fulong Ye et al.Feb 12arXiv

DreamID-Omni is one model that can create, edit, and animate human-centered videos with matching voices, all in sync.

#audio-video generation#diffusion transformer#identity preservation

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Intermediate

Zehong Ma, Ruihan Xu et al.Feb 2arXiv

PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.

#pixel diffusion#perceptual loss#LPIPS

Self-Refining Video Sampling

Intermediate

Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

Alterbute: Editing Intrinsic Attributes of Objects in Images

Intermediate

Tal Reiss, Daniel Winter et al.Jan 15arXiv

Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.

#intrinsic attribute editing#visual named entities#identity preservation

Future Optical Flow Prediction Improves Robot Control & Video Generation

Intermediate

Kanchana Ranasinghe, Honglu Zhou et al.Jan 15arXiv

FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.

#optical flow#future optical flow prediction#vision-language model

1 2 3