🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts12

Groups

📐Linear Algebra15📈Calculus & Differentiation10🎯Optimization14🎲Probability Theory12📊Statistics for ML9📡Information Theory10🔺Convex Optimization7🔢Numerical Methods6🕸Graph Theory for Deep Learning6🔵Topology for ML5🌐Differential Geometry6∞Measure Theory & Functional Analysis6🎰Random Matrix Theory5🌊Fourier Analysis & Signal Processing9🎰Sampling & Monte Carlo Methods10🧠Deep Learning Theory12🛡️Regularization Theory11👁️Attention & Transformer Theory10🎨Generative Model Theory11🔮Representation Learning10🎮Reinforcement Learning Mathematics9🔄Variational Methods8📉Loss Functions & Objectives10⏱️Sequence & Temporal Models8💎Geometric Deep Learning8

Category

🔷All∑Math⚙️Algo🗂️DS📚Theory

Level

AllBeginnerIntermediate
⚙️AlgorithmIntermediate

Lion Optimizer

Lion (Evolved Sign Momentum) is a first-order, sign-based optimizer discovered through automated program search.

#lion optimizer#sign-based optimization#momentum+12
⚙️AlgorithmIntermediate

Sharpness-Aware Minimization (SAM)

Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.

#sharpness-aware minimization
Advanced
Filtering by:
#stochastic gradient descent
#sam optimizer
#robust optimization
+11
∑MathIntermediate

Surrogate Loss Theory

0-1 loss directly measures classification error but is discontinuous and non-convex, making optimization computationally hard.

#surrogate loss#0-1 loss#hinge loss+12
📚TheoryIntermediate

Value Function Approximation

Value function approximation replaces a huge table of values with a small set of parameters that can generalize across similar states.

#reinforcement learning#value function approximation#linear function approximator+12
📚TheoryIntermediate

Grokking & Delayed Generalization

Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.

#grokking#delayed generalization#weight decay+12
📚TheoryIntermediate

Empirical Risk Minimization

Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.

#empirical risk minimization#expected risk#loss function+12
⚙️AlgorithmIntermediate

Learning Rate Schedules

Learning rate schedules control how fast a model learns over time by changing the learning rate across iterations or epochs.

#learning rate schedules#step decay#cosine annealing+12
⚙️AlgorithmIntermediate

Adam & Adaptive Methods

Adam is an optimization algorithm that combines momentum (first moment) with RMSProp-style adaptive learning rates (second moment).

#adam#adaptive methods#rmsprop+12
⚙️AlgorithmIntermediate

Momentum Methods

Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.

#momentum#heavy-ball#polyak momentum+12
⚙️AlgorithmIntermediate

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (SGD) updates model parameters using small random subsets (mini-batches) of data, making learning faster and more memory-efficient.

#stochastic gradient descent#mini-batch#random shuffling+12
📚TheoryIntermediate

Randomized Algorithm Theory

Randomized algorithms use random bits to make choices that simplify design, avoid worst cases, and often speed up computation.

#randomized algorithms#las vegas#monte carlo+12
📚TheoryIntermediate

Gradient Descent Convergence Theory

Gradient descent updates parameters by stepping opposite the gradient: x_{t+1} = x_t - \eta \nabla f(x_t).

#gradient descent#convergence rate#l-smooth+12