🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

📝Daily Log 🎯Prompts 🧠Review

Search Settings

How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts5

Groups

📐Linear Algebra15 📈Calculus & Differentiation10 🎯Optimization14 🎲Probability Theory12 📊Statistics for ML9 📡Information Theory10 🔺Convex Optimization7 🔢Numerical Methods6 🕸Graph Theory for Deep Learning6 🔵Topology for ML5 🌐Differential Geometry6 ∞Measure Theory & Functional Analysis6 🎰Random Matrix Theory5 🌊Fourier Analysis & Signal Processing9 🎰Sampling & Monte Carlo Methods10 🧠Deep Learning Theory12 🛡️Regularization Theory11 👁️Attention & Transformer Theory10 🎨Generative Model Theory11 🔮Representation Learning10 🎮Reinforcement Learning Mathematics9 🔄Variational Methods8 📉Loss Functions & Objectives10 ⏱️Sequence & Temporal Models8 💎Geometric Deep Learning8

Category

🔷All ∑Math ⚙️Algo 🗂️DS 📚Theory

Level

All Beginner Intermediate

📚TheoryIntermediate

Mixture of Experts (MoE)

A Mixture of Experts (MoE) routes each input to a small subset of specialized models called experts, enabling conditional computation.

#mixture of experts#moe#gating network+12

📚TheoryIntermediate

Dropout

Dropout randomly turns off (zeros) some neurons during training to prevent the network from memorizing the training data.

#dropout#inverted dropout

Filtering by:

#neural networks

#bernoulli mask

+12

📚TheoryIntermediate

Spectral Normalization

Spectral normalization rescales a weight matrix so its largest singular value (spectral norm) is at most a target value, typically 1.

#spectral normalization#spectral norm#singular value+12

📚TheoryIntermediate

Weight Initialization Strategies

Weight initialization sets the starting values of neural network parameters so signals and gradients neither explode nor vanish as they pass through layers.

#xavier#glorot#he+12

📚TheoryAdvanced

PAC-Bayes Theory

PAC-Bayes provides high-probability generalization bounds for randomized predictors by comparing a data-dependent posterior Q to a fixed, data-independent prior P through KL(Q||P).

#pac-bayes#generalization bound#kl divergence+12