Groups
Category
L2 regularization (also called ridge or weight decay) adds a penalty proportional to the sum of squared weights to discourage large parameters.
Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.