Groups
Focal Loss reshapes cross-entropy so that hard, misclassified examples get more focus while easy, well-classified ones are down-weighted.
Multi-Head Attention runs several attention mechanisms in parallel so each head can focus on different relationships in the data.