Groups
Category
Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.
Graph Neural Networks (GNNs) learn on graphs by repeatedly letting each node aggregate messages from its neighbors and update its representation.