Groups
Category
Level
Multi-task loss balancing aims to automatically set each taskโs weight so that no single loss dominates training.
Early stopping halts training when the validation loss stops improving, preventing overfitting and saving compute.
In underdetermined linear systems (more variables than equations), gradient descent started at zero converges to the minimum Euclidean norm solution without any explicit regularizer.
The Universal Approximation Theorems say that a neural network with at least one hidden layer and a suitable activation can approximate any continuous function on a compact domain as closely as you like.
Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.
A loss landscape is the โterrainโ of a modelโs loss as you move through parameter space; valleys are good solutions and peaks are bad ones.
Convex optimization studies minimizing convex functions over convex sets, where every local minimum is guaranteed to be a global minimum.
Optimization theory studies how to choose variables to minimize or maximize an objective while respecting constraints.
Gradient descent updates parameters by stepping opposite the gradient: x_{t+1} = x_t - \eta \nabla f(x_t).