Papers13

#KV cache

Context Forcing: Consistent Autoregressive Video Generation with Long Context

The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.

#autoregressive video generation#long-context modeling#distribution matching distillation

LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding

Intermediate

Gang Lin, Dongfang Li et al.Feb 4arXiv

Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.

#long-context LLM#sparse attention#head specialization

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Intermediate

Hyesung Jeon, Hyeongju Ha et al.Feb 1arXiv

Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.

#LoRA#Multi-LoRA#KV cache

Causal World Modeling for Robot Control

Intermediate

Lin Li, Qihang Zhang et al.Jan 29arXiv

Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.

#robot world model#autoregressive diffusion#causal masking

LoL: Longer than Longer, Scaling Video Generation to Hour

Intermediate

Justin Cui, Jie Wu et al.Jan 23arXiv

This paper fixes a big problem in long video-making AIs where the video keeps snapping back to the beginning, like a movie stuck on rewind.

#sink-collapse#Rotary Position Embedding#RoPE jitter

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Intermediate

Haowei Zhang, Shudong Yang et al.Jan 21arXiv

HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.

#HERMES#KV cache#hierarchical memory

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

Intermediate

Linrui Ma, Yufei Cui et al.Jan 20arXiv

The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.

#discrete diffusion#block diffusion#semi-autoregressive

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Intermediate

Giulio Corallo, Paolo PapottiJan 13arXiv

This paper introduces PCED, a way to use many documents as separate 'experts' in parallel so an AI can stitch answers together without stuffing everything into one giant prompt.

#Retrieval-Augmented Generation#PCED#contrastive decoding

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Intermediate

Yixuan Tang, Yi YangJan 3arXiv

This paper shows how to get strong text embeddings from decoder-only language models without any training.

#text embeddings#decoder-only LLMs#causal attention

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate

Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Intermediate

Haonan Qiu, Shikun Liu et al.Dec 24arXiv

HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.

#high-resolution video generation#diffusion transformer (DiT)#dual-resolution caching

Kling-Omni Technical Report

Intermediate

Kling Team, Jialu Chen et al.Dec 18arXiv

Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.

#multimodal visual language#MVL#prompt enhancer

1 2