LLaDA2.0: Scaling Up Diffusion Language Models to 100B
IntermediateTiwei Bie, Maosong Cao et al.Dec 10arXiv
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.
#diffusion language model#masked diffusion#block diffusion