Papers5

All Beginner Intermediate Advanced

All Sources arXiv

#FID

DREAM: Where Visual Understanding Meets Text-to-Image Generation

Beginner

Chao Li, Tianhong Li et al.Mar 3arXiv

DREAM is one model that both understands images (like CLIP) and makes images from text (like top text-to-image models).

#DREAM#contrastive learning#masked autoregressive modeling

Image Generation with a Sphere Encoder

Beginner

Kaiyu Yue, Menglin Jia et al.Feb 16arXiv

The Sphere Encoder is a new way to make images fast by teaching an autoencoder to place all images evenly on a big imaginary sphere and then decode random spots on that sphere back into pictures.

#Sphere Encoder#Spherical Latent Space#RMS Normalization

FrankenMotion: Part-level Human Motion Generation and Composition

Beginner

Chuqiao Li, Xianghui Xie et al.Jan 15arXiv

FrankenMotion is a new AI that makes human motion by controlling each body part over time, like a careful puppeteer.

#Human motion generation#Part-level control#Hierarchical conditioning

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Beginner

Xingyu Zhou, Qifan Li et al.Dec 30arXiv

This paper shows a simple way to make image-generating AIs (diffusion Transformers) produce clearer, more accurate pictures by letting the model guide itself from the inside.

#Internal Guidance#Diffusion Transformer#Intermediate Supervision

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Beginner

Yifan Zhou, Zeqi Xiao et al.Dec 18arXiv

This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.

#Log-linear Sparse Attention#Hierarchical Top-K#Hierarchical KV Enrichment