Groups
Category
Mamba uses a state-space model whose parameters are selected (gated) by the current input token, letting the model adapt its memory dynamics at each step.
Standard softmax attention costs O(nยฒ) in sequence length because every token compares with every other token.