Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction
IntermediateJang-Hyun Kim, Dongyoon Han et al.Jan 25arXiv
Fast KVzip is a new way to shrink an LLMโs memory (the KV cache) while keeping answers just as accurate.
#KV cache compression#gated KV eviction#sink attention