๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts14

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediate
โš™๏ธAlgorithmIntermediate

Sharpness-Aware Minimization (SAM)

Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.

#sharpness-aware minimization#sam optimizer#robust optimization+11
โˆ‘MathIntermediate

Cross-Entropy Loss

Cross-entropy loss measures how well predicted probabilities match the true labels by penalizing confident wrong predictions heavily.

#cross-entropy
12
Advanced
Filtering by:
#logistic regression
#binary cross-entropy
#softmax
+11
๐Ÿ“šTheoryIntermediate

RLHF Mathematics

RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.

#rlhf#bradley-terry#pairwise comparisons+11
โˆ‘MathIntermediate

Elastic Net Regularization

Elastic Net regularization combines L1 (Lasso) and L2 (Ridge) penalties to produce models that are both sparse and stable.

#elastic net#lasso#ridge regression+12
โˆ‘MathIntermediate

L2 Regularization (Ridge/Weight Decay)

L2 regularization (also called ridge or weight decay) adds a penalty proportional to the sum of squared weights to discourage large parameters.

#l2 regularization#ridge regression#weight decay+12
๐Ÿ“šTheoryIntermediate

Grokking & Delayed Generalization

Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.

#grokking#delayed generalization#weight decay+12
๐Ÿ“šTheoryIntermediate

Cross-Entropy

Cross-entropy measures how well a proposed distribution Q predicts outcomes actually generated by a true distribution P.

#cross-entropy#entropy#kl divergence+12
โˆ‘MathIntermediate

Maximum A Posteriori (MAP) Estimation

Maximum A Posteriori (MAP) estimation chooses the parameter value with the highest posterior probability after seeing data.

#map estimation#posterior mode#bayesian inference+12
โˆ‘MathIntermediate

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) chooses parameters that make the observed data most probable under a chosen model.

#maximum likelihood#log-likelihood#bernoulli mle+12
๐Ÿ“šTheoryIntermediate

Loss Landscape Analysis

A loss landscape is the โ€œterrainโ€ of a modelโ€™s loss as you move through parameter space; valleys are good solutions and peaks are bad ones.

#loss landscape#sharpness#hessian eigenvalues+12
โš™๏ธAlgorithmIntermediate

Momentum Methods

Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.

#momentum#heavy-ball#polyak momentum+12
โš™๏ธAlgorithmIntermediate

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (SGD) updates model parameters using small random subsets (mini-batches) of data, making learning faster and more memory-efficient.

#stochastic gradient descent#mini-batch#random shuffling+12