SLA2: Sparse-Linear Attention with Learnable Routing and QAT
IntermediateJintao Zhang, Haoxu Wang et al.Feb 13arXiv
SLA2 is a new way for AI to pay attention faster by smartly splitting work between two helpers: a precise one (sparse attention) and a speedy one (linear attention).
#Sparse Attention#Linear Attention#SLA2