Papers2

#Lightning Attention

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

MiniCPM Team, Wenhao An et al.Feb 12arXiv

MiniCPM-SALA is a 9B-parameter language model that mixes two kinds of attention—sparse and linear—to read very long texts quickly and accurately.

#long-context modeling#sparse attention#linear attention

Not triaged yet

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Intermediate

Yingfa Chen, Zhen Leng Thai et al.Jan 29arXiv

This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).

#hybrid attention#RNN attention hybrid#linear attention

Not triaged yet