FASA: Frequency-aware Sparse Attention
IntermediateYifei Wang, Yueqi Wang et al.Feb 3arXiv
FASA is a training-free method that makes large language models faster and lighter on memory by keeping only the most useful past tokens during decoding.
#FASA#Frequency-aware sparse attention#KV cache compression