Groups
Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.
Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.