Groups
Category
Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.
Matrix exponentiation turns repeated linear transitions into a single fast power of a matrix using exponentiation by squaring.