How I Study AI - Learn AI Papers & Lectures the Easy Way

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Yingfa Chen, Zhen Leng Thai et al.Jan 29arXiv

This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).

#hybrid attention#RNN attention hybrid#linear attention

Not triaged yet

Fast-weight Product Key Memory

Intermediate

Tianyu Zhao, Llion JonesJan 2arXiv

The paper introduces Fast-weight Product Key Memory (FwPKM), a memory layer that can quickly learn from the current text it reads, not just from past training.

#Fast-weight memory#Product Key Memory#Sparse retrieval

Not triaged yet

End-to-End Test-Time Training for Long Context

Intermediate

Arnuv Tandon, Karan Dalal et al.Dec 29arXiv

This paper shows how a language model can keep learning while you use it, so it handles very long inputs without slowing down.

#Test-Time Training#Meta-learning#Long-context language modeling

Not triaged yet

Papers3

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Fast-weight Product Key Memory

End-to-End Test-Time Training for Long Context