Groups
Category
Level
Transformers are permutation-invariant by default, so they need positional encodings to understand word order in sequences.
Attention computes a weighted sum of values V where the weights come from how similar queries Q are to keys K.