๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts12

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediate
๐Ÿ“šTheoryIntermediate

Multi-Task Loss Balancing

Multi-task loss balancing aims to automatically set each taskโ€™s weight so that no single loss dominates training.

#multi-task learning#uncertainty weighting#homoscedastic uncertainty+12
๐Ÿ“šTheoryAdvanced

In-Context Learning Theory

In-context learning (ICL) means a model learns from examples provided in the input itself, without updating its parameters.

#in-context learning
Advanced
Filtering by:
#gradient descent
#transformer
#attention
+12
๐Ÿ“šTheoryIntermediate

Early Stopping

Early stopping halts training when the validation loss stops improving, preventing overfitting and saving compute.

#early stopping#validation loss#patience+11
๐Ÿ“šTheoryAdvanced

Feature Learning vs Kernel Regime

The kernel (lazy) regime keeps neural network parameters close to their initialization, making training equivalent to kernel regression with a fixed kernel such as the Neural Tangent Kernel (NTK).

#neural tangent kernel#kernel ridge regression#lazy training+12
๐Ÿ“šTheoryIntermediate

Implicit Bias of Gradient Descent

In underdetermined linear systems (more variables than equations), gradient descent started at zero converges to the minimum Euclidean norm solution without any explicit regularizer.

#implicit bias#gradient descent#minimum norm+12
๐Ÿ“šTheoryIntermediate

Universal Approximation Theorems

The Universal Approximation Theorems say that a neural network with at least one hidden layer and a suitable activation can approximate any continuous function on a compact domain as closely as you like.

#universal approximation theorem#cybenko#hornik+12
๐Ÿ“šTheoryIntermediate

Empirical Risk Minimization

Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.

#empirical risk minimization#expected risk#loss function+12
๐Ÿ“šTheoryIntermediate

Loss Landscape Analysis

A loss landscape is the โ€œterrainโ€ of a modelโ€™s loss as you move through parameter space; valleys are good solutions and peaks are bad ones.

#loss landscape#sharpness#hessian eigenvalues+12
๐Ÿ“šTheoryAdvanced

Calculus of Variations

Calculus of variations optimizes functionalsโ€”numbers produced by whole functionsโ€”rather than ordinary functions of numbers.

#calculus of variations#eulerโ€“lagrange#functional derivative+12
๐Ÿ“šTheoryIntermediate

Convex Optimization

Convex optimization studies minimizing convex functions over convex sets, where every local minimum is guaranteed to be a global minimum.

#convex optimization#convex function#convex set+12
๐Ÿ“šTheoryIntermediate

Optimization Theory

Optimization theory studies how to choose variables to minimize or maximize an objective while respecting constraints.

#optimization#convex optimization#gradient descent+12
๐Ÿ“šTheoryIntermediate

Gradient Descent Convergence Theory

Gradient descent updates parameters by stepping opposite the gradient: x_{t+1} = x_t - \eta \nabla f(x_t).

#gradient descent#convergence rate#l-smooth+12