Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
BeginnerZecheng Tang, Quantong Qiu et al.Jan 24arXiv
Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.
#elastic attention#sparse attention#full attention