Concepts7

Groups

Grokking & Delayed Generalization

Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.

#grokking#delayed generalization#weight decay+12

Implicit Bias of Gradient Descent

In underdetermined linear systems (more variables than equations), gradient descent started at zero converges to the minimum Euclidean norm solution without any explicit regularizer.

#implicit bias

Concepts7

Grokking & Delayed Generalization

Implicit Bias of Gradient Descent

Lottery Ticket Hypothesis

Double Descent Phenomenon

Depth vs Width Tradeoffs

Scaling Laws

Universal Approximation Theorem