๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts140

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediateAdvanced
โˆ‘MathIntermediate

Huber Loss & Smooth L1

Huber loss behaves like mean squared error (quadratic) for small residuals and like mean absolute error (linear) for large residuals, making it both stable and robust.

#huber loss#smooth l1#robust regression+12
โˆ‘MathBeginner

Mean Squared Error (MSE)

Mean Squared Error (MSE) measures the average of the squared differences between true values and predictions, punishing larger mistakes more strongly.

#mean squared error
12345
#mse
#sse
+11
โˆ‘MathIntermediate

Cross-Entropy Loss

Cross-entropy loss measures how well predicted probabilities match the true labels by penalizing confident wrong predictions heavily.

#cross-entropy#binary cross-entropy#softmax+11
โˆ‘MathAdvanced

Evidence Lower Bound (ELBO)

The Evidence Lower Bound (ELBO) is a tractable lower bound on the log evidence log p(x) used to perform approximate Bayesian inference.

#elbo#variational inference#vae+12
โˆ‘MathIntermediate

Discount Factor & Return

The discounted return G_t sums all future rewards but down-weights distant rewards by powers of a discount factor ฮณ.

#discount factor#discounted return#reinforcement learning+12
โˆ‘MathIntermediate

Markov Decision Processes (MDP)

A Markov Decision Process (MDP) models decision-making in situations where outcomes are partly random and partly under the control of a decision maker.

#markov decision process#value iteration#policy iteration+12
โˆ‘MathAdvanced

Stochastic Differential Equations for Generation

A forward stochastic differential equation (SDE) models a state that drifts deterministically and is shaken by random Brownian noise over time.

#stochastic differential equation#diffusion model#euler maruyama+12
โˆ‘MathIntermediate

Wasserstein Distance & Optimal Transport

Wasserstein distance (Earth Moverโ€™s Distance) measures how much โ€œworkโ€ is needed to transform one probability distribution into another by moving mass with minimal total cost.

#wasserstein distance#earth mover's distance#optimal transport+12
โˆ‘MathIntermediate

Softmax & Temperature Scaling

Softmax turns arbitrary real-valued scores (logits) into probabilities that sum to one.

#softmax#temperature scaling#logits+12
โˆ‘MathIntermediate

Positional Encoding Mathematics

Sinusoidal positional encoding represents each tokenโ€™s position using pairs of sine and cosine waves at exponentially spaced frequencies.

#positional encoding#sinusoidal#transformer+11
โˆ‘MathIntermediate

Elastic Net Regularization

Elastic Net regularization combines L1 (Lasso) and L2 (Ridge) penalties to produce models that are both sparse and stable.

#elastic net#lasso#ridge regression+12
โˆ‘MathIntermediate

L2 Regularization (Ridge/Weight Decay)

L2 regularization (also called ridge or weight decay) adds a penalty proportional to the sum of squared weights to discourage large parameters.

#l2 regularization#ridge regression#weight decay+12