📚TheoryIntermediate
Self-Attention as Graph Neural Network
Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.
#self-attention#graph neural network#message passing+11