Papers5

#Linear Attention

Memory Caching: RNNs with Growing Memory

Recurrent neural networks (RNNs) are fast but forgetful because they squeeze everything they’ve seen into a tiny, fixed memory.

#Memory Caching#Recurrent Neural Networks#Attention

Not triaged yet

Test-Time Training with KV Binding Is Secretly Linear Attention

Intermediate

Junchen Liu, Sven Elflein et al.Feb 24arXiv

The paper shows that Test-Time Training (TTT) with key–value (KV) binding is not really memorizing like a notebook; it is acting like a learned linear attention layer.

#Test-Time Training#KV Binding#Linear Attention

Not triaged yet

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Intermediate

Lei Xin, Yuhao Zheng et al.Feb 20arXiv

The paper proposes HyTRec, a recommender system that reads very long histories fast while still paying sharp attention to the latest clicks and purchases.

#Hybrid Attention#Linear Attention#Softmax Attention

Not triaged yet

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Intermediate

Jintao Zhang, Haoxu Wang et al.Feb 13arXiv

SLA2 is a new way for AI to pay attention faster by smartly splitting work between two helpers: a precise one (sparse attention) and a speedy one (linear attention).

#Sparse Attention#Linear Attention#SLA2

Not triaged yet

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Intermediate

Kewei Zhang, Ye Huang et al.Jan 12arXiv

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

Not triaged yet