Groups
Category
Level
Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.