Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.
Recurrent Neural Networks (RNNs) are special neural networks that learn from sequences, like sentences or time series, by remembering what came before.
The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.