Groups
Category
Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.
Use an operation budget of about 10^8 simple operations per second on typical online judges; always multiply by the time limit and number of test files if known.