šŸŽ“How I Study AIHISA
šŸ“–Read
šŸ“„PapersšŸ“°BlogsšŸŽ¬Courses
šŸ’”Learn
šŸ›¤ļøPathsšŸ“šTopicsšŸ’”ConceptsšŸŽ“Shorts
šŸŽÆPractice
šŸ“Daily LogšŸŽÆPrompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts141

Groups

šŸ“Linear Algebra15šŸ“ˆCalculus & Differentiation10šŸŽÆOptimization14šŸŽ²Probability Theory12šŸ“ŠStatistics for ML9šŸ“”Information Theory10šŸ”ŗConvex Optimization7šŸ”¢Numerical Methods6šŸ•øGraph Theory for Deep Learning6šŸ”µTopology for ML5🌐Differential Geometry6āˆžMeasure Theory & Functional Analysis6šŸŽ°Random Matrix Theory5🌊Fourier Analysis & Signal Processing9šŸŽ°Sampling & Monte Carlo Methods10🧠Deep Learning Theory12šŸ›”ļøRegularization Theory11šŸ‘ļøAttention & Transformer Theory10šŸŽØGenerative Model Theory11šŸ”®Representation Learning10šŸŽ®Reinforcement Learning Mathematics9šŸ”„Variational Methods8šŸ“‰Loss Functions & Objectives10ā±ļøSequence & Temporal Models8šŸ’ŽGeometric Deep Learning8

Category

šŸ”·Allāˆ‘Mathāš™ļøAlgošŸ—‚ļøDSšŸ“šTheory

Level

AllBeginnerIntermediate
āš™ļøAlgorithmIntermediate

Mixed Precision Training

Mixed precision training stores and computes tensors in low precision (FP16/BF16) for speed and memory savings while keeping a master copy of weights in FP32 for accurate updates.

#mixed precision#fp16#bf16+10
āš™ļøAlgorithmIntermediate

Distributed & Parallel Optimization

Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.

#data parallelism
12345
Advanced
#synchronous sgd
#asynchronous sgd
+12
āš™ļøAlgorithmIntermediate

Lion Optimizer

Lion (Evolved Sign Momentum) is a first-order, sign-based optimizer discovered through automated program search.

#lion optimizer#sign-based optimization#momentum+12
āš™ļøAlgorithmIntermediate

Sharpness-Aware Minimization (SAM)

Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.

#sharpness-aware minimization#sam optimizer#robust optimization+11
āš™ļøAlgorithmIntermediate

Sparse Matrices & Computation

A sparse matrix stores only its nonzero entries, saving huge amounts of memory when most entries are zero.

#sparse matrix#csr#csc+12
āš™ļøAlgorithmIntermediate

Dynamic Time Warping

Dynamic Time Warping (DTW) aligns two time series that may vary in speed to find the minimum-cost correspondence between their elements.

#dynamic time warping#dtw c++#time series alignment+11
āš™ļøAlgorithmIntermediate

Expectation Maximization (EM)

Expectation Maximization (EM) is an iterative algorithm to estimate parameters when some variables are hidden or unobserved.

#expectation maximization#em algorithm#e-step+12
āš™ļøAlgorithmIntermediate

PPO & Trust Region Methods

Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.

#ppo#trust region#trpo+11
āš™ļøAlgorithmIntermediate

Temporal Difference Learning

Temporal Difference (TD) Learning updates value estimates by bootstrapping from the next state's current estimate, enabling fast, online learning.

#temporal difference learning#td(0)#sarsa+12
āš™ļøAlgorithmIntermediate

t-SNE & UMAP

t-SNE and UMAP are nonlinear dimensionality-reduction methods that preserve local neighborhoods to make high-dimensional data visible in 2D or 3D.

#t-sne#umap#dimensionality reduction+12
āš™ļøAlgorithmIntermediate

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) finds new orthogonal axes (principal components) that capture the maximum variance in your data.

#principal component analysis#pca c++#eigendecomposition+11
āš™ļøAlgorithmIntermediate

Efficient Attention Mechanisms

Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.

#linear attention#efficient attention#kernel trick+12