Groups
Category
Label smoothing replaces a hard one-hot target with a slightly softened distribution to reduce model overconfidence.
Transformers are permutation-invariant by default, so they need positional encodings to understand word order in sequences.