🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts152

Groups

📐Linear Algebra15📈Calculus & Differentiation10🎯Optimization14🎲Probability Theory12📊Statistics for ML9📡Information Theory10🔺Convex Optimization7🔢Numerical Methods6🕸Graph Theory for Deep Learning6🔵Topology for ML5🌐Differential Geometry6∞Measure Theory & Functional Analysis6🎰Random Matrix Theory5🌊Fourier Analysis & Signal Processing9🎰Sampling & Monte Carlo Methods10🧠Deep Learning Theory12🛡️Regularization Theory11👁️Attention & Transformer Theory10🎨Generative Model Theory11🔮Representation Learning10🎮Reinforcement Learning Mathematics9🔄Variational Methods8📉Loss Functions & Objectives10⏱️Sequence & Temporal Models8💎Geometric Deep Learning8

Category

🔷All∑Math⚙️Algo🗂️DS📚Theory

Level

AllBeginnerIntermediateAdvanced
📚TheoryIntermediate

Scaled Dot-Product Attention

Scaled dot-product attention scores how much each value V should contribute to a query by taking dot products with keys K, scaling by \(\sqrt{d_k}\), applying softmax, and forming a weighted sum.

#scaled dot-product attention#softmax#transformer+10
📚TheoryIntermediate

Stochastic Depth

Stochastic Depth randomly drops whole residual layers during training while keeping the full network at inference time.

#stochastic depth
34567
#resnet
#residual block
+12
📚TheoryIntermediate

Spectral Regularization

Spectral regularization controls how much a weight matrix can stretch inputs by constraining its largest singular value (spectral norm).

#spectral regularization#spectral norm#power iteration+11
📚TheoryIntermediate

Early Stopping

Early stopping halts training when the validation loss stops improving, preventing overfitting and saving compute.

#early stopping#validation loss#patience+11
📚TheoryIntermediate

Label Smoothing

Label smoothing replaces a hard one-hot target with a slightly softened distribution to reduce model overconfidence.

#label smoothing#cross-entropy#softmax+12
📚TheoryIntermediate

Data Augmentation Theory

Data augmentation expands the training distribution by applying label-preserving transformations to inputs, which lowers overfitting and improves generalization.

#data augmentation#vicinal risk minimization#invariance+12
📚TheoryIntermediate

Layer Normalization

Layer Normalization rescales and recenters each sample across its feature dimensions, making it independent of batch size.

#layer normalization#gamma beta#feature normalization+12
📚TheoryIntermediate

Batch Normalization

Batch Normalization rescales and recenters activations using mini-batch statistics to stabilize and speed up neural network training.

#batch normalization#mini-batch statistics#gamma beta+11
📚TheoryIntermediate

Dropout

Dropout randomly turns off (zeros) some neurons during training to prevent the network from memorizing the training data.

#dropout#inverted dropout#bernoulli mask+12
📚TheoryAdvanced

Feature Learning vs Kernel Regime

The kernel (lazy) regime keeps neural network parameters close to their initialization, making training equivalent to kernel regression with a fixed kernel such as the Neural Tangent Kernel (NTK).

#neural tangent kernel#kernel ridge regression#lazy training+12
📚TheoryIntermediate

Grokking & Delayed Generalization

Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.

#grokking#delayed generalization#weight decay+12
📚TheoryAdvanced

Mean Field Theory of Neural Networks

Mean field theory treats very wide randomly initialized neural networks as averaging machines where each neuron behaves like a sample from a common distribution.

#mean field theory#neural tangent kernel#neural network gaussian process+12