How I Study AI - Learn AI Papers & Lectures the Easy Way

One-step Latent-free Image Generation with Pixel Mean Flows

Beginner

Yiyang Lu, Susie Lu et al.Jan 29arXiv

This paper shows how to make a whole picture in one go, directly in pixels, without using a hidden “latent” space or many tiny steps.

#pixel MeanFlow#one-step generation#x-prediction

VINO: A Unified Visual Generator with Interleaved OmniModal Context

Beginner

Junyi Chen, Tong He et al.Jan 5arXiv

VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.

#VINO#unified visual generator#multimodal diffusion transformer

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Beginner

Ethan Chern, Zhulin Hu et al.Dec 29arXiv

LiveTalk turns slow, many-step video diffusion into a fast, 4-step, real-time system for talking avatars that listen, think, and respond with synchronized video.

#real-time video diffusion#on-policy distillation#multimodal conditioning

Papers3

One-step Latent-free Image Generation with Pixel Mean Flows

VINO: A Unified Visual Generator with Interleaved OmniModal Context

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation