Papers2

All Beginner Intermediate Advanced

All Sources arXiv

#VAE latents

DREAM: Where Visual Understanding Meets Text-to-Image Generation

Beginner

Chao Li, Tianhong Li et al.Mar 3arXiv

DREAM is one model that both understands images (like CLIP) and makes images from text (like top text-to-image models).

#DREAM#contrastive learning#masked autoregressive modeling

Not triaged yet

VINO: A Unified Visual Generator with Interleaved OmniModal Context

Beginner

Junyi Chen, Tong He et al.Jan 5arXiv

VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.

#VINO#unified visual generator#multimodal diffusion transformer

Not triaged yet