MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
IntermediateKewei Zhang, Ye Huang et al.Jan 12arXiv
Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.
#Multi-Head Linear Attention#Linear Attention#Self-Attention