Groups
A proximal operator pulls a point x toward minimizing a function f while penalizing how far it moves, acting like a denoiser or projector depending on f.
Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.