Groups
Category
Level
Expectation is the long-run average value of a random variable and acts like the balance point of its distribution.
Conditional probability measures the chance of event A happening when we already know event B happened.
A random variable maps uncertain outcomes to numbers and is described by a distribution that assigns likelihoods to values or ranges.
Kolmogorovโs axioms define probability as a measure on events: non-negativity, normalization, and countable additivity.
A loss landscape is the โterrainโ of a modelโs loss as you move through parameter space; valleys are good solutions and peaks are bad ones.
Weight initialization sets the starting values of neural network parameters so signals and gradients neither explode nor vanish as they pass through layers.
Gradient clipping limits how large gradient values or their overall magnitude can become during optimization to prevent exploding updates.
Newton's method uses both the gradient and the Hessian to take steps that aim directly at the local optimum by fitting a quadratic model of the loss around the current point.
Lagrange multipliers let you optimize a function while strictly satisfying equality constraints by introducing auxiliary variables (the multipliers).
Learning rate schedules control how fast a model learns over time by changing the learning rate across iterations or epochs.
Adam is an optimization algorithm that combines momentum (first moment) with RMSProp-style adaptive learning rates (second moment).
Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.