🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts12

Groups

📐Linear Algebra15📈Calculus & Differentiation10🎯Optimization14🎲Probability Theory12📊Statistics for ML9📡Information Theory10🔺Convex Optimization7🔢Numerical Methods6🕸Graph Theory for Deep Learning6🔵Topology for ML5🌐Differential Geometry6∞Measure Theory & Functional Analysis6🎰Random Matrix Theory5🌊Fourier Analysis & Signal Processing9🎰Sampling & Monte Carlo Methods10🧠Deep Learning Theory12🛡️Regularization Theory11👁️Attention & Transformer Theory10🎨Generative Model Theory11🔮Representation Learning10🎮Reinforcement Learning Mathematics9🔄Variational Methods8📉Loss Functions & Objectives10⏱️Sequence & Temporal Models8💎Geometric Deep Learning8

Category

🔷All∑Math⚙️Algo🗂️DS📚Theory

Level

AllBeginnerIntermediate
📚TheoryAdvanced

Feature Learning vs Kernel Regime

The kernel (lazy) regime keeps neural network parameters close to their initialization, making training equivalent to kernel regression with a fixed kernel such as the Neural Tangent Kernel (NTK).

#neural tangent kernel#kernel ridge regression#lazy training+12
📚TheoryIntermediate

Grokking & Delayed Generalization

Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.

#grokking
Advanced
Group:
Deep Learning Theory
#delayed generalization
#weight decay
+12
📚TheoryAdvanced

Mean Field Theory of Neural Networks

Mean field theory treats very wide randomly initialized neural networks as averaging machines where each neuron behaves like a sample from a common distribution.

#mean field theory#neural tangent kernel#neural network gaussian process+12
📚TheoryAdvanced

Information Bottleneck in Deep Learning

The Information Bottleneck (IB) principle formalizes learning compact representations T that keep only the information about X that is useful for predicting Y.

#information bottleneck#variational information bottleneck#mutual information+11
📚TheoryAdvanced

Generalization Bounds for Deep Learning

Generalization bounds explain why deep neural networks can perform well on unseen data despite having many parameters.

#generalization bounds#pac-bayes#compression bounds+12
📚TheoryIntermediate

Implicit Bias of Gradient Descent

In underdetermined linear systems (more variables than equations), gradient descent started at zero converges to the minimum Euclidean norm solution without any explicit regularizer.

#implicit bias#gradient descent#minimum norm+12
📚TheoryIntermediate

Lottery Ticket Hypothesis

The Lottery Ticket Hypothesis (LTH) says that inside a large dense neural network there exist small sparse subnetworks that, when trained in isolation from their original initialization, can reach comparable accuracy to the full model.

#lottery ticket hypothesis#magnitude pruning#sparsity+12
📚TheoryIntermediate

Double Descent Phenomenon

Double descent describes how test error first follows the classic U-shape with increasing model complexity, spikes near the interpolation threshold, and then drops again in the highly overparameterized regime.

#double descent#interpolation threshold#overparameterization+12
📚TheoryAdvanced

Neural Tangent Kernel (NTK)

Neural Tangent Kernel (NTK) describes how wide neural networks train like kernel machines, turning gradient descent into kernel regression in the infinite-width limit.

#neural tangent kernel#ntk#nngp+12
📚TheoryIntermediate

Depth vs Width Tradeoffs

Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.

#depth vs width#relu#piecewise linear+12
📚TheoryIntermediate

Scaling Laws

Scaling laws say that model loss typically follows a power law that improves predictably as you increase parameters, data, or compute.

#scaling laws#power law#chinchilla scaling+12
📚TheoryIntermediate

Universal Approximation Theorem

The Universal Approximation Theorem (UAT) says a feedforward neural network with one hidden layer and a non-polynomial activation (like sigmoid or ReLU) can approximate any continuous function on a compact set as closely as we want.

#universal approximation theorem#cybenko#hornik+12