Groups
Category
Level
Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).
Early stopping halts training when the validation loss stops improving, preventing overfitting and saving compute.
Dropout randomly turns off (zeros) some neurons during training to prevent the network from memorizing the training data.
L2 regularization (also called ridge or weight decay) adds a penalty proportional to the sum of squared weights to discourage large parameters.
Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.
Maximum A Posteriori (MAP) estimation chooses the parameter value with the highest posterior probability after seeing data.
The biasโvariance tradeoff explains how prediction error splits into bias squared, variance, and irreducible noise for squared loss.