๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts172

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediate
๐Ÿ“šTheoryAdvanced

Transformer Expressiveness

Transformer expressiveness studies what kinds of sequence-to-sequence mappings a Transformer can represent or approximate.

#transformer expressiveness#universal approximation#self-attention+12
๐Ÿ“šTheoryAdvanced

Feature Learning vs Kernel Regime

The kernel (lazy) regime keeps neural network parameters close to their initialization, making training equivalent to kernel regression with a fixed kernel such as the Neural Tangent Kernel (NTK).

#neural tangent kernel
12345
Advanced
#kernel ridge regression
#lazy training
+12
๐Ÿ“šTheoryAdvanced

Mean Field Theory of Neural Networks

Mean field theory treats very wide randomly initialized neural networks as averaging machines where each neuron behaves like a sample from a common distribution.

#mean field theory#neural tangent kernel#neural network gaussian process+12
๐Ÿ“šTheoryAdvanced

Information Bottleneck in Deep Learning

The Information Bottleneck (IB) principle formalizes learning compact representations T that keep only the information about X that is useful for predicting Y.

#information bottleneck#variational information bottleneck#mutual information+11
๐Ÿ“šTheoryAdvanced

Generalization Bounds for Deep Learning

Generalization bounds explain why deep neural networks can perform well on unseen data despite having many parameters.

#generalization bounds#pac-bayes#compression bounds+12
๐Ÿ“šTheoryAdvanced

Neural Tangent Kernel (NTK)

Neural Tangent Kernel (NTK) describes how wide neural networks train like kernel machines, turning gradient descent into kernel regression in the infinite-width limit.

#neural tangent kernel#ntk#nngp+12
โš™๏ธAlgorithmAdvanced

Langevin Dynamics & Score-Based Sampling

Langevin dynamics is a noisy gradient-ascent method that moves particles toward high probability regions while adding Gaussian noise to ensure proper exploration.

#langevin dynamics#mala#ula+12
โš™๏ธAlgorithmAdvanced

Hamiltonian Monte Carlo (HMC)

Hamiltonian Monte Carlo (HMC) uses gradients of the log-density to propose long-distance moves that still land in high-probability regions.

#hamiltonian monte carlo#hmc#mcmc+11
๐Ÿ“šTheoryAdvanced

Spectral Convolution on Graphs

Spectral convolution on graphs generalizes the classical notion of convolution using the graphโ€™s Laplacian eigenvectors as โ€œFourierโ€ basis functions.

#spectral graph theory#graph fourier transform#laplacian eigenvectors+12
๐Ÿ“šTheoryAdvanced

Random Matrix Theory in High-Dimensional Statistics

Random Matrix Theory (RMT) explains how eigenvalues of large random matrices behave when the dimension p is comparable to the sample size n.

#random matrix theory#marchenko-pastur#wigner semicircle+12
๐Ÿ“šTheoryAdvanced

Spectral Analysis of Neural Networks

Spectral analysis studies the distribution of eigenvalues and singular values of neural network weight matrices during training.

#spectral analysis#eigenvalues#singular values+12
โˆ‘MathAdvanced

Free Probability Theory

Free probability studies "random variables" that do not commute, where independence is replaced by freeness and noncrossing combinatorics replaces classical partitions.

#free probability#freeness#r-transform+11