Groups
Category
Level
Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).
Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.