🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#latency reduction

Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

Intermediate
Tong Zheng, Chengsong Huang et al.Feb 3arXiv

Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.

#parallel thinking#2D probing#consensus-based early stopping

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction

Intermediate
Jang-Hyun Kim, Dongyoon Han et al.Jan 25arXiv

Fast KVzip is a new way to shrink an LLM’s memory (the KV cache) while keeping answers just as accurate.

#KV cache compression#gated KV eviction#sink attention

Toward Efficient Agents: Memory, Tool learning, and Planning

Intermediate
Xiaofang Yang, Lijun Li et al.Jan 20arXiv

This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.

#agent efficiency#memory compression#tool learning

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Intermediate
Gonzalo Ariel Meyoyan, Luciano Del CorroJan 19arXiv

This paper shows how to add a tiny helper (a probe) to a big language model so it can classify things like safety or sentiment during the same pass it already does to answer you.

#LLM orchestration#single-pass classification#hidden-state probing

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Intermediate
Monishwaran Maheswaran, Rishabh Tiwari et al.Dec 4arXiv

ARBITRAGE makes AI solve step-by-step problems faster by only using the big, slow model when it is predicted to truly help.

#speculative decoding#step-level speculative decoding#advantage-aware routing