How I Study AI - Learn AI Papers & Lectures the Easy Way

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Intermediate

Kewei Zhang, Ye Huang et al.Jan 12arXiv

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

Papers1

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head