The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
This paper builds MemoryRewardBench, a big test that checks if reward models (AI judges) can fairly grade how other AIs manage long-term memory, not just whether their final answers are right.
Diffusion language models (dLLMs) can write all parts of an answer in parallel, but they usually take many tiny cleanup steps, which makes them slow.