This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.
This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.