Groups
Category
Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.
Stochastic Gradient Descent (SGD) updates model parameters using small random subsets (mini-batches) of data, making learning faster and more memory-efficient.