dLLM is a single, open-source toolbox that standardizes how diffusion language models are trained, run, and tested.
The paper studies Mamba-2 (a fast, linear-time attention method) and pares it down to the pieces that truly boost accuracy.
The paper fixes a common problem in video world models: scenes slowly change or “drift” when the camera moves and comes back.
The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.
Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.
Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.
Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.
This paper fixes a big problem in long video-making AIs where the video keeps snapping back to the beginning, like a movie stuck on rewind.
HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.
The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.
This paper introduces PCED, a way to use many documents as separate 'experts' in parallel so an AI can stitch answers together without stuffing everything into one giant prompt.
This paper shows how to get strong text embeddings from decoder-only language models without any training.