Category
Level
Transformers map sequences to sequences using layers of self-attention and feed-forward networks wrapped with residual connections and LayerNorm.