🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts6

Groups

📐Linear Algebra15📈Calculus & Differentiation10🎯Optimization14🎲Probability Theory12📊Statistics for ML9📡Information Theory10🔺Convex Optimization7🔢Numerical Methods6🕸Graph Theory for Deep Learning6🔵Topology for ML5🌐Differential Geometry6∞Measure Theory & Functional Analysis6🎰Random Matrix Theory5🌊Fourier Analysis & Signal Processing9🎰Sampling & Monte Carlo Methods10🧠Deep Learning Theory12🛡️Regularization Theory11👁️Attention & Transformer Theory10🎨Generative Model Theory11🔮Representation Learning10🎮Reinforcement Learning Mathematics9🔄Variational Methods8📉Loss Functions & Objectives10⏱️Sequence & Temporal Models8💎Geometric Deep Learning8

Category

🔷All∑Math⚙️Algo🗂️DS📚Theory

Level

AllBeginnerIntermediate
📚TheoryIntermediate

Implicit Bias of Gradient Descent

In underdetermined linear systems (more variables than equations), gradient descent started at zero converges to the minimum Euclidean norm solution without any explicit regularizer.

#implicit bias#gradient descent#minimum norm+12
⚙️AlgorithmIntermediate

Gradient Clipping & Normalization

Gradient clipping limits how large gradient values or their overall magnitude can become during optimization to prevent exploding updates.

#gradient clipping
Advanced
Filtering by:
#learning rate
#clipping by norm
#clipping by value
+12
⚙️AlgorithmIntermediate

Adam & Adaptive Methods

Adam is an optimization algorithm that combines momentum (first moment) with RMSProp-style adaptive learning rates (second moment).

#adam#adaptive methods#rmsprop+12
⚙️AlgorithmIntermediate

Momentum Methods

Momentum methods add an exponentially weighted memory of past gradients to make descent steps smoother and faster, especially in ravines and ill-conditioned problems.

#momentum#heavy-ball#polyak momentum+12
⚙️AlgorithmIntermediate

Gradient Descent

Gradient descent is a simple, repeatable way to move downhill on a loss surface by stepping in the opposite direction of the gradient.

#gradient descent#batch gradient descent#learning rate+12
📚TheoryIntermediate

Gradient Descent Convergence Theory

Gradient descent updates parameters by stepping opposite the gradient: x_{t+1} = x_t - \eta \nabla f(x_t).

#gradient descent#convergence rate#l-smooth+12