This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.
The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.
Dream-VL and Dream-VLA use a diffusion language model backbone to understand images, talk about them, and plan actions better than many regular (autoregressive) models.
This paper studies how a newer kind of language model, called a discrete diffusion language model (DLM), gets better as we give it more data, bigger models, and more compute.
Diffusion language models (dLLMs) can write all parts of an answer in parallel, but they usually take many tiny cleanup steps, which makes them slow.