Groups
Category
Spectral normalization rescales a weight matrix so its largest singular value (spectral norm) is at most a target value, typically 1.
Transformers are permutation-invariant by default, so they need positional encodings to understand word order in sequences.