Groups
Transformer expressiveness studies what kinds of sequence-to-sequence mappings a Transformer can represent or approximate.
Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.