Groups
Category
Level
Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).
Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.
Algorithmic Information Theory studies information content via the shortest programs that generate data, rather than via average-case probabilities.