Groups
Category
Standard softmax attention costs O(nยฒ) in sequence length because every token compares with every other token.
An RKHS is a space of functions where evaluating a function at a point equals taking an inner product with a kernel section, which enables the โkernel trick.โ