๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts152

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediateAdvanced
๐Ÿ“šTheoryAdvanced

Transformer Theory

Transformers map sequences to sequences using layers of self-attention and feed-forward networks wrapped with residual connections and LayerNorm.

#transformer#self-attention#positional encoding+12
๐Ÿ“šTheoryAdvanced

Reinforcement Learning Theory

Reinforcement Learning (RL) studies how an agent learns to act in an environment to maximize long-term cumulative reward.

#reinforcement learning
910111213
#mdp
#bellman equation
+12
๐Ÿ“šTheoryAdvanced

Neural Tangent Kernel (NTK) Theory

The Neural Tangent Kernel (NTK) connects very wide neural networks to classical kernel methods, letting us study training as if it were kernel regression.

#neural tangent kernel#ntk#infinite width+12
๐Ÿ“šTheoryIntermediate

Attention Mechanism Theory

Attention computes a weighted sum of values V where the weights come from how similar queries Q are to keys K.

#attention#self-attention#multi-head attention+12
๐Ÿ“šTheoryIntermediate

Scaling Laws

Scaling laws say that model loss typically follows a power law that improves predictably as you increase parameters, data, or compute.

#scaling laws#power law#chinchilla scaling+12
๐Ÿ“šTheoryAdvanced

Calculus of Variations

Calculus of variations optimizes functionalsโ€”numbers produced by whole functionsโ€”rather than ordinary functions of numbers.

#calculus of variations#eulerโ€“lagrange#functional derivative+12
๐Ÿ“šTheoryAdvanced

Deep Learning Generalization Theory

Deep learning generalization theory tries to explain why overparameterized networks can fit (interpolate) training data yet still perform well on new data.

#generalization#implicit regularization#minimum norm+12
๐Ÿ“šTheoryAdvanced

Neural Network Expressivity

Neural network expressivity studies what kinds of functions different network architectures can represent and how efficiently they can do so.

#neural network expressivity#depth separation#relu linear regions+12
๐Ÿ“šTheoryAdvanced

Statistical Learning Theory

Statistical learning theory explains why a model that fits training data can still predict well on unseen data by relating true risk to empirical risk plus a complexity term.

#statistical learning theory#empirical risk minimization#structural risk minimization+11
๐Ÿ“šTheoryIntermediate

Universal Approximation Theorem

The Universal Approximation Theorem (UAT) says a feedforward neural network with one hidden layer and a non-polynomial activation (like sigmoid or ReLU) can approximate any continuous function on a compact set as closely as we want.

#universal approximation theorem#cybenko#hornik+12
๐Ÿ“šTheoryIntermediate

Minimax Theorem

The Minimax Theorem states that in zero-sum two-player games with suitable convexity and compactness, the best guaranteed payoff for the maximizer equals the worst-case loss for the minimizer.

#minimax theorem#zero-sum games#saddle point+12
๐Ÿ“šTheoryIntermediate

PAC Learning

PAC learning formalizes when a learner can probably (with probability at least 1โˆ’ฮด) and approximately (error at most ฮต) succeed using a polynomial number of samples.

#pac learning#agnostic learning#vc dimension+12