Papers8

#Classifier-Free Guidance

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.

#Diffusion Transformer#Text Conditioning#Multi-layer LLM Features

Balancing Understanding and Generation in Discrete Diffusion Models

Intermediate

Yue Liu, Yuzhong Zhao et al.Feb 1arXiv

This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.

#XDLM#discrete diffusion#stationary noise kernel

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Beginner

Xingyu Zhou, Qifan Li et al.Dec 30arXiv

This paper shows a simple way to make image-generating AIs (diffusion Transformers) produce clearer, more accurate pictures by letting the model guide itself from the inside.

#Internal Guidance#Diffusion Transformer#Intermediate Supervision

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Beginner

Zefeng He, Xiaoye Qu et al.Dec 30arXiv

DiffThinker turns hard picture-based puzzles into an image-to-image drawing task instead of a long texting task.

#DiffThinker#Generative Multimodal Reasoning#Diffusion Models

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

Intermediate

Senmao Li, Kai Wang et al.Dec 18arXiv

StageVAR makes image-generating AI much faster by recognizing that early steps set the meaning and structure, while later steps just polish details.

#Visual Autoregressive Modeling#Next-Scale Prediction#Stage-Aware Acceleration

RecTok: Reconstruction Distillation along Rectified Flow

Intermediate

Qingyu Shi, Size Wu et al.Dec 15arXiv

RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.

#Rectified Flow#Flow Matching#Visual Tokenizer

Bidirectional Normalizing Flow: From Data to Noise and Back

Intermediate

Yiyang Lu, Qiao Sun et al.Dec 11arXiv

Normalizing Flows are models that learn how to turn real images into simple noise and then back again.

#Normalizing Flow#Bidirectional Normalizing Flow#Hidden Alignment

Distribution Matching Variational AutoEncoder

Beginner

Sen Ye, Jianning Pei et al.Dec 8arXiv

This paper shows a new way to teach an autoencoder to shape its hidden space (the 'latent space') to look like any distribution we want, not just a simple bell curve.

#Distribution Matching VAE#Latent Space#Self-Supervised Learning