Groups
Transformer expressiveness studies what kinds of sequence-to-sequence mappings a Transformer can represent or approximate.
Transformers map sequences to sequences using layers of self-attention and feed-forward networks wrapped with residual connections and LayerNorm.