Papers22

#KV cache

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Proact-VL is a video-talking AI that knows not only what to say but also when to say it, like a great sports commentator.

#Proactive VideoLLM#real-time commentary#streaming video understanding

Not triaged yet

dLLM: Simple Diffusion Language Modeling

Intermediate

Zhanhui Zhou, Lingjie Chen et al.Feb 26arXiv

dLLM is a single, open-source toolbox that standardizes how diffusion language models are trained, run, and tested.

#diffusion language models#masked diffusion#block diffusion

Not triaged yet

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Intermediate

Gabriel Mongaras, Eric C. LarsonFeb 19arXiv

The paper studies Mamba-2 (a fast, linear-time attention method) and pares it down to the pieces that truly boost accuracy.

#linear attention#Mamba-2#2Mamba

Not triaged yet

Voxtral Realtime

Beginner

Alexander H. Liu, Andy Ehrenberg et al.Feb 11arXiv

Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.

#streaming ASR#real-time transcription#causal audio encoder

Not triaged yet

Geometry-Aware Rotary Position Embedding for Consistent Video World Model

Intermediate

Chendong Xiang, Jiajun Liu et al.Feb 8arXiv

The paper fixes a common problem in video world models: scenes slowly change or “drift” when the camera moves and comes back.

#ViewRope#geometry-aware attention#rotary position embedding

Not triaged yet

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Intermediate

Shuo Chen, Cong Wei et al.Feb 5arXiv

The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.

#autoregressive video generation#long-context modeling#distribution matching distillation

Not triaged yet

LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding

Intermediate

Gang Lin, Dongfang Li et al.Feb 4arXiv

Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.

#long-context LLM#sparse attention#head specialization

Not triaged yet

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Intermediate

Hyesung Jeon, Hyeongju Ha et al.Feb 1arXiv

Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.

#LoRA#Multi-LoRA#KV cache

Not triaged yet

Causal World Modeling for Robot Control

Intermediate

Lin Li, Qihang Zhang et al.Jan 29arXiv

Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.

#robot world model#autoregressive diffusion#causal masking

Not triaged yet

LoL: Longer than Longer, Scaling Video Generation to Hour

Intermediate

Justin Cui, Jie Wu et al.Jan 23arXiv

This paper fixes a big problem in long video-making AIs where the video keeps snapping back to the beginning, like a movie stuck on rewind.

#sink-collapse#Rotary Position Embedding#RoPE jitter

Not triaged yet

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Intermediate

Haowei Zhang, Shudong Yang et al.Jan 21arXiv

HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.

#HERMES#KV cache#hierarchical memory

Not triaged yet

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

Intermediate

Linrui Ma, Yufei Cui et al.Jan 20arXiv

The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.

#discrete diffusion#block diffusion#semi-autoregressive

Not triaged yet

1 2