Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
IntermediateJingdi Lei, Di Zhang et al.Dec 14arXiv
Standard attention is slow for long texts because it compares every word with every other word, which takes quadratic time.
#error-free linear attention#rank-1 matrix exponential#continuous-time dynamics