Papers3

#Self-Attention

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

Not triaged yet

Recurrent Neural Networks (RNNs): A gentle Introduction and Overview

Beginner

Robin M. SchmidtNov 23arXiv

Recurrent Neural Networks (RNNs) are special neural networks that learn from sequences, like sentences or time series, by remembering what came before.

#Recurrent Neural Network#Backpropagation Through Time#Truncated BPTT

Not triaged yet

Attention Is All You Need

Intermediate

Ashish Vaswani, Noam Shazeer et al.Jun 12arXiv

The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.

#Transformer#Self-Attention#Multi-Head Attention

Not triaged yet