Groups
Category
Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.
Online algorithms make decisions step by step without seeing the future and are judged against an all-knowing offline optimum.