๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts356

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediate
๐Ÿ“šTheoryIntermediate

Lottery Ticket Hypothesis

The Lottery Ticket Hypothesis (LTH) says that inside a large dense neural network there exist small sparse subnetworks that, when trained in isolation from their original initialization, can reach comparable accuracy to the full model.

#lottery ticket hypothesis#magnitude pruning#sparsity+12
๐Ÿ“šTheoryIntermediate

Double Descent Phenomenon

Double descent describes how test error first follows the classic U-shape with increasing model complexity, spikes near the interpolation threshold, and then drops again in the highly overparameterized regime.

56789
Advanced
#double descent#interpolation threshold#overparameterization+12
๐Ÿ“šTheoryIntermediate

Depth vs Width Tradeoffs

Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.

#depth vs width#relu#piecewise linear+12
โš™๏ธAlgorithmIntermediate

Stratified & Latin Hypercube Sampling

Stratified sampling reduces Monte Carlo variance by dividing the domain into non-overlapping regions (strata) and sampling within each region.

#stratified sampling#latin hypercube sampling#variance reduction+11
๐Ÿ“šTheoryIntermediate

Reparameterization Trick

The reparameterization trick rewrites a random variable as a deterministic function of noise that does not depend on the parameters, such as z = ฮผ + ฯƒ ยท ฮต with ฮต ~ N(0, 1).

#reparameterization trick#pathwise derivative#variational autoencoder+11
โš™๏ธAlgorithmIntermediate

Gibbs Sampling

Gibbs sampling is an MCMC method that generates samples by repeatedly drawing each variable from its conditional distribution given the others.

#gibbs sampling#mcmc#markov chain+12
โš™๏ธAlgorithmIntermediate

Metropolis-Hastings Algorithm

Metropolisโ€“Hastings is a clever accept/reject method that lets you sample from complex probability distributions using only an unnormalized density.

#metropolis-hastings#mcmc#acceptance ratio+12
โš™๏ธAlgorithmIntermediate

Markov Chain Monte Carlo (MCMC)

MCMC builds a random walk (a Markov chain) whose long-run visiting frequency matches your target distribution, even when the target is only known up to a constant.

#mcmc#metropolis-hastings#gibbs sampling+12
โš™๏ธAlgorithmIntermediate

Rejection Sampling

Rejection sampling draws from a hard target distribution by using an easier proposal and accepting with probability p(x)/(M q(x)).

#rejection sampling#accept-reject#proposal distribution+11
โš™๏ธAlgorithmIntermediate

Importance Sampling

Importance sampling rewrites an expectation under a hard-to-sample distribution p as an expectation under an easier distribution q, multiplied by a weight w = p/q.

#importance sampling#proposal distribution#self-normalized+12
โš™๏ธAlgorithmIntermediate

Monte Carlo Estimation

Monte Carlo estimation approximates an expected value by averaging function values at random samples drawn from a probability distribution.

#monte carlo#expectation#variance reduction+12
๐Ÿ“šTheoryIntermediate

Spectral Normalization

Spectral normalization rescales a weight matrix so its largest singular value (spectral norm) is at most a target value, typically 1.

#spectral normalization#spectral norm#singular value+12