Groups
Category
Key-Value memory systems store information as pairs where keys are used to look up values by similarity rather than exact match.
Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.