Groups
Category
Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.
Gradient clipping limits how large gradient values or their overall magnitude can become during optimization to prevent exploding updates.