🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers22

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#KV cache

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Intermediate
Giulio Corallo, Paolo PapottiJan 13arXiv

This paper introduces PCED, a way to use many documents as separate 'experts' in parallel so an AI can stitch answers together without stuffing everything into one giant prompt.

#Retrieval-Augmented Generation#PCED#contrastive decoding

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Beginner
Wenhao Zeng, Xuteng Zhang et al.Jan 8arXiv

Big reasoning AIs think in many steps, which is slow and costly.

#collaborative inference#initial token entropy#step-level routing

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Intermediate
Yixuan Tang, Yi YangJan 3arXiv

This paper shows how to get strong text embeddings from decoder-only language models without any training.

#text embeddings#decoder-only LLMs#causal attention

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Beginner
Ethan Chern, Zhulin Hu et al.Dec 29arXiv

LiveTalk turns slow, many-step video diffusion into a fast, 4-step, real-time system for talking avatars that listen, think, and respond with synchronized video.

#real-time video diffusion#on-policy distillation#multimodal conditioning

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate
Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Intermediate
Haonan Qiu, Shikun Liu et al.Dec 24arXiv

HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.

#high-resolution video generation#diffusion transformer (DiT)#dual-resolution caching

Kling-Omni Technical Report

Intermediate
Kling Team, Jialu Chen et al.Dec 18arXiv

Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.

#multimodal visual language#MVL#prompt enhancer

LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Beginner
Chenkai Xu, Yijie Jin et al.Dec 18arXiv

This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.

#Diffusion LLM#Parallel decoding#Token Filling Order

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Intermediate
Sihui Ji, Xi Chen et al.Dec 16arXiv

MemFlow is a new way for AI to remember the right parts of a long video story while it keeps making new parts, so characters and scenes stay consistent.

#MemFlow#Narrative Adaptive Memory#Sparse Memory Activation

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Beginner
Yifei Yu, Xiaoshan Wu et al.Dec 4arXiv

VideoSSM is a new way to make long, stable, and lively videos by giving the model two kinds of memory: a short-term window and a long-term state-space memory.

#autoregressive video diffusion#state-space model#hybrid memory
12