๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts28

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediate
๐Ÿ“šTheoryIntermediate

Knowledge Distillation Loss

Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.

#knowledge distillation#kd loss#temperature scaling+12
โˆ‘MathIntermediate

Cross-Entropy Loss

Cross-entropy loss measures how well predicted probabilities match the true labels by penalizing confident wrong predictions heavily.

#cross-entropy
123
Advanced
Filtering by:
#kl divergence
#binary cross-entropy
#softmax
+11
โš™๏ธAlgorithmAdvanced

Wake-Sleep Algorithm

The Wakeโ€“Sleep algorithm trains a pair of models: a generative model that explains how data are produced and a recognition model that guesses hidden causes from observed data.

#wake-sleep#helmholtz machine#generative model+12
๐Ÿ“šTheoryAdvanced

Variational Dropout & Bayesian Deep Learning

Dropout can be interpreted as variational inference in a Bayesian neural network, where applying random masks approximates sampling from a posterior over weights.

#bayesian neural networks#variational inference#dropout+12
โˆ‘MathAdvanced

Evidence Lower Bound (ELBO)

The Evidence Lower Bound (ELBO) is a tractable lower bound on the log evidence log p(x) used to perform approximate Bayesian inference.

#elbo#variational inference#vae+12
๐Ÿ“šTheoryIntermediate

Variational Inference

Variational Inference (VI) turns Bayesian inference into an optimization problem by choosing a simple family q(z) to approximate an intractable posterior p(z|x).

#variational inference#elbo#kl divergence+12
โš™๏ธAlgorithmIntermediate

PPO & Trust Region Methods

Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.

#ppo#trust region#trpo+11
โš™๏ธAlgorithmIntermediate

t-SNE & UMAP

t-SNE and UMAP are nonlinear dimensionality-reduction methods that preserve local neighborhoods to make high-dimensional data visible in 2D or 3D.

#t-sne#umap#dimensionality reduction+12
๐Ÿ“šTheoryAdvanced

Disentangled Representations

Disentangled representations aim to encode independent factors of variation (like shape, size, or color) into separate coordinates of a latent vector.

#disentangled representations#independent factors#total correlation+12
๐Ÿ“šTheoryAdvanced

Variational Autoencoders (VAE) Theory

A Variational Autoencoder (VAE) is a probabilistic autoencoder that learns to generate data by inferring hidden causes (latent variables) and decoding them back to observations.

#variational autoencoder#elbo#kl divergence+12
๐Ÿ“šTheoryIntermediate

Maximum Likelihood & Generative Models

Maximum Likelihood Estimation (MLE) picks parameters that make the observed data most probable under a chosen probabilistic model.

#maximum likelihood#generative models#naive bayes+12
๐Ÿ“šTheoryAdvanced

Information Bottleneck in Deep Learning

The Information Bottleneck (IB) principle formalizes learning compact representations T that keep only the information about X that is useful for predicting Y.

#information bottleneck#variational information bottleneck#mutual information+11