This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.
Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.
The paper asks what a truly good diffusion-based language model should look like and lists five must-have properties.
This paper studies how a newer kind of language model, called a discrete diffusion language model (DLM), gets better as we give it more data, bigger models, and more compute.
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.
Diffusion language models write by gradually unmasking hidden words, so deciding which blanks to reveal next is a big deal for both speed and accuracy.