Concepts64

Scaling Laws

Scaling laws say that model loss typically follows a power law that improves predictably as you increase parameters, data, or compute.

#scaling laws#power law#chinchilla scaling+12

📚TheoryAdvanced

Calculus of Variations

Calculus of variations optimizes functionals—numbers produced by whole functions—rather than ordinary functions of numbers.

#calculus of variations#euler–lagrange#functional derivative+12

📚TheoryAdvanced

Deep Learning Generalization Theory

Deep learning generalization theory tries to explain why overparameterized networks can fit (interpolate) training data yet still perform well on new data.

#generalization#implicit regularization#minimum norm+12

📚TheoryAdvanced

Neural Network Expressivity

Neural network expressivity studies what kinds of functions different network architectures can represent and how efficiently they can do so.

#neural network expressivity#depth separation#relu linear regions+12

📚TheoryAdvanced

Statistical Learning Theory

Statistical learning theory explains why a model that fits training data can still predict well on unseen data by relating true risk to empirical risk plus a complexity term.

#statistical learning theory#empirical risk minimization#structural risk minimization+11

📚TheoryIntermediate

Universal Approximation Theorem

The Universal Approximation Theorem (UAT) says a feedforward neural network with one hidden layer and a non-polynomial activation (like sigmoid or ReLU) can approximate any continuous function on a compact set as closely as we want.

#universal approximation theorem#cybenko#hornik+12

📚TheoryIntermediate

Minimax Theorem

The Minimax Theorem states that in zero-sum two-player games with suitable convexity and compactness, the best guaranteed payoff for the maximizer equals the worst-case loss for the minimizer.

#minimax theorem#zero-sum games#saddle point+12

📚TheoryIntermediate

PAC Learning

PAC learning formalizes when a learner can probably (with probability at least 1−δ) and approximately (error at most ε) succeed using a polynomial number of samples.

#pac learning#agnostic learning#vc dimension+12

📚TheoryAdvanced

VC Dimension

VC dimension measures how many distinct labelings a hypothesis class can realize on any set of points of a given size.

#vc dimension#vapnik chervonenkis#shattering+12

📚TheoryIntermediate

Bias-Variance Tradeoff

The bias–variance tradeoff explains how prediction error splits into bias squared, variance, and irreducible noise for squared loss.

#bias variance tradeoff#mse decomposition#polynomial regression+12

📚TheoryAdvanced

Rademacher Complexity

Rademacher complexity is a data-dependent measure of how well a function class can fit random noise on a given sample.

#rademacher complexity#empirical rademacher#generalization bounds+12

📚TheoryIntermediate

Game Theory

Game theory studies strategic decision-making among multiple players where each player’s payoff depends on everyone’s actions.

#game theory#nash equilibrium#mixed strategies+11

2 3 4 5 6