Groups
Category
Mamba uses a state-space model whose parameters are selected (gated) by the current input token, letting the model adapt its memory dynamics at each step.
Spectral normalization rescales a weight matrix so its largest singular value (spectral norm) is at most a target value, typically 1.