Groups
Category
Level
Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.
Maximum A Posteriori (MAP) estimation chooses the parameter value with the highest posterior probability after seeing data.
A multivariate Gaussian (normal) distribution models a vector of real-valued variables with a bell-shaped probability hill in many dimensions.
Newton's method uses both the gradient and the Hessian to take steps that aim directly at the local optimum by fitting a quadratic model of the loss around the current point.
A real symmetric matrix A is positive definite if and only if x^T A x > 0 for every nonzero vector x, and positive semidefinite if x^T A x โฅ 0.
The Evidence Lower Bound (ELBO) is a tractable lower bound on the log evidence log p(x) that enables learning and inference in latent variable models like VAEs.
Information Bottleneck (IB) studies how to compress an input X into a representation Z that still preserves what is needed to predict Y.
Statistical learning theory explains why a model that fits training data can still predict well on unseen data by relating true risk to empirical risk plus a complexity term.
The biasโvariance tradeoff explains how prediction error splits into bias squared, variance, and irreducible noise for squared loss.