🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

📝Daily Log 🎯Prompts 🧠Review

Search Settings

How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts95

Groups

📐Linear Algebra15 📈Calculus & Differentiation10 🎯Optimization14 🎲Probability Theory12 📊Statistics for ML9 📡Information Theory10 🔺Convex Optimization7 🔢Numerical Methods6 🕸Graph Theory for Deep Learning6 🔵Topology for ML5 🌐Differential Geometry6 ∞Measure Theory & Functional Analysis6 🎰Random Matrix Theory5 🌊Fourier Analysis & Signal Processing9 🎰Sampling & Monte Carlo Methods10 🧠Deep Learning Theory12 🛡️Regularization Theory11 👁️Attention & Transformer Theory10 🎨Generative Model Theory11 🔮Representation Learning10 🎮Reinforcement Learning Mathematics9 🔄Variational Methods8 📉Loss Functions & Objectives10 ⏱️Sequence & Temporal Models8 💎Geometric Deep Learning8

Category

🔷All ∑Math ⚙️Algo 🗂️DS 📚Theory

Level

All Beginner Intermediate

📚TheoryIntermediate

Layer Normalization

Layer Normalization rescales and recenters each sample across its feature dimensions, making it independent of batch size.

#layer normalization#gamma beta#feature normalization+12

📚TheoryIntermediate

Batch Normalization

Batch Normalization rescales and recenters activations using mini-batch statistics to stabilize and speed up neural network training.

#batch normalization

#mini-batch statistics

#gamma beta

+11

📚TheoryIntermediate

Dropout

Dropout randomly turns off (zeros) some neurons during training to prevent the network from memorizing the training data.

#dropout#inverted dropout#bernoulli mask+12

📚TheoryIntermediate

Grokking & Delayed Generalization

Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.

#grokking#delayed generalization#weight decay+12

📚TheoryIntermediate

Implicit Bias of Gradient Descent

In underdetermined linear systems (more variables than equations), gradient descent started at zero converges to the minimum Euclidean norm solution without any explicit regularizer.

#implicit bias#gradient descent#minimum norm+12

📚TheoryIntermediate

Lottery Ticket Hypothesis

The Lottery Ticket Hypothesis (LTH) says that inside a large dense neural network there exist small sparse subnetworks that, when trained in isolation from their original initialization, can reach comparable accuracy to the full model.

#lottery ticket hypothesis#magnitude pruning#sparsity+12

📚TheoryIntermediate

Double Descent Phenomenon

Double descent describes how test error first follows the classic U-shape with increasing model complexity, spikes near the interpolation threshold, and then drops again in the highly overparameterized regime.

#double descent#interpolation threshold#overparameterization+12

📚TheoryIntermediate

Depth vs Width Tradeoffs

Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.

#depth vs width#relu#piecewise linear+12

📚TheoryIntermediate

Reparameterization Trick

The reparameterization trick rewrites a random variable as a deterministic function of noise that does not depend on the parameters, such as z = μ + σ · ε with ε ~ N(0, 1).

#reparameterization trick#pathwise derivative#variational autoencoder+11

📚TheoryIntermediate

Spectral Normalization

Spectral normalization rescales a weight matrix so its largest singular value (spectral norm) is at most a target value, typically 1.

#spectral normalization#spectral norm#singular value+12

📚TheoryIntermediate

Positional Encoding Theory

Transformers are permutation-invariant by default, so they need positional encodings to understand word order in sequences.

#positional encoding#sinusoidal encoding#transformer+11

📚TheoryIntermediate

Universal Approximation Theorems

The Universal Approximation Theorems say that a neural network with at least one hidden layer and a suitable activation can approximate any continuous function on a compact domain as closely as you like.

#universal approximation theorem#cybenko#hornik+12