This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.
Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.
The paper shows that video AIs do not need long, human-like chains of thought to reason well.
Big language models use RoPE to remember word order, but it throws away the imaginary half of a complex number during attention.