Papers4

#Needle-in-a-Haystack

Memory Caching: RNNs with Growing Memory

Recurrent neural networks (RNNs) are fast but forgetful because they squeeze everything they’ve seen into a tiny, fixed memory.

#Memory Caching#Recurrent Neural Networks#Attention

Not triaged yet

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Intermediate

Yingfa Chen, Zhen Leng Thai et al.Jan 29arXiv

This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).

#hybrid attention#RNN attention hybrid#linear attention

Not triaged yet

Fast-weight Product Key Memory

Intermediate

Tianyu Zhao, Llion JonesJan 2arXiv

The paper introduces Fast-weight Product Key Memory (FwPKM), a memory layer that can quickly learn from the current text it reads, not just from past training.

#Fast-weight memory#Product Key Memory#Sparse retrieval

Not triaged yet

End-to-End Test-Time Training for Long Context

Intermediate

Arnuv Tandon, Karan Dalal et al.Dec 29arXiv

This paper shows how a language model can keep learning while you use it, so it handles very long inputs without slowing down.

#Test-Time Training#Meta-learning#Long-context language modeling

Not triaged yet