SWAA: Sliding Window Attention Adaptation for Efficient Long-Context LLMs Without Pretraining
IntermediateYijiong Yu, Jiale Liu et al.Dec 11arXiv
Long texts make standard attention in large language models very slow because it checks every word against every other word.
#Sliding Window Attention#SWAA#FA Decode