Groups
Category
Level
Multi-Head Attention runs several attention mechanisms in parallel so each head can focus on different relationships in the data.