This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.
The paper asks what a truly good diffusion-based language model should look like and lists five must-have properties.
This paper studies how a newer kind of language model, called a discrete diffusion language model (DLM), gets better as we give it more data, bigger models, and more compute.
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.
Diffusion language models write by gradually unmasking hidden words, so deciding which blanks to reveal next is a big deal for both speed and accuracy.