Papers7

All Beginner Intermediate Advanced

All Sources arXiv

#latency reduction

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Intermediate

Euisoo Jung, Byunghyun Kim et al.Feb 25arXiv

Diffusion models make great images but are slow because they fix noise step by step many times.

#diffusion inference#multi-GPU acceleration#data parallelism

Not triaged yet

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

Intermediate

Jiwon Song, Yoongon Kim et al.Feb 6arXiv

RelayGen is a training-free way to switch between a big model and a small model while one answer is being generated.

#RelayGen#intra-generation model switching#segment-level routing

Not triaged yet

Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

Intermediate

Tong Zheng, Chengsong Huang et al.Feb 3arXiv

Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.

#parallel thinking#2D probing#consensus-based early stopping

Not triaged yet

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction

Intermediate

Jang-Hyun Kim, Dongyoon Han et al.Jan 25arXiv

Fast KVzip is a new way to shrink an LLM’s memory (the KV cache) while keeping answers just as accurate.

#KV cache compression#gated KV eviction#sink attention

Not triaged yet

Toward Efficient Agents: Memory, Tool learning, and Planning

Intermediate

Xiaofang Yang, Lijun Li et al.Jan 20arXiv

This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.

#agent efficiency#memory compression#tool learning

Not triaged yet

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Intermediate

Gonzalo Ariel Meyoyan, Luciano Del CorroJan 19arXiv

This paper shows how to add a tiny helper (a probe) to a big language model so it can classify things like safety or sentiment during the same pass it already does to answer you.

#LLM orchestration#single-pass classification#hidden-state probing

Not triaged yet

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Intermediate

Monishwaran Maheswaran, Rishabh Tiwari et al.Dec 4arXiv

ARBITRAGE makes AI solve step-by-step problems faster by only using the big, slow model when it is predicted to truly help.

#speculative decoding#step-level speculative decoding#advantage-aware routing

Not triaged yet