🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

📝Daily Log 🎯Prompts 🧠Review

Search Settings

How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts10

Groups

📐Linear Algebra15 📈Calculus & Differentiation10 🎯Optimization14 🎲Probability Theory12 📊Statistics for ML9 📡Information Theory10 🔺Convex Optimization7 🔢Numerical Methods6 🕸Graph Theory for Deep Learning6 🔵Topology for ML5 🌐Differential Geometry6 ∞Measure Theory & Functional Analysis6 🎰Random Matrix Theory5 🌊Fourier Analysis & Signal Processing9 🎰Sampling & Monte Carlo Methods10 🧠Deep Learning Theory12 🛡️Regularization Theory11 👁️Attention & Transformer Theory10 🎨Generative Model Theory11 🔮Representation Learning10 🎮Reinforcement Learning Mathematics9 🔄Variational Methods8 📉Loss Functions & Objectives10 ⏱️Sequence & Temporal Models8 💎Geometric Deep Learning8

Category

🔷All ∑Math ⚙️Algo 🗂️DS 📚Theory

Level

All Beginner Intermediate

📚TheoryIntermediate

Mixture of Experts (MoE)

A Mixture of Experts (MoE) routes each input to a small subset of specialized models called experts, enabling conditional computation.

#mixture of experts#moe#gating network+12

📚TheoryIntermediate

Key-Value Memory Systems

Key-Value memory systems store information as pairs where keys are used to look up values by similarity rather than exact match.

#key-value memory

Group:

Attention & Transformer Theory

#attention

#scaled dot-product

+12

∑MathIntermediate

Softmax & Temperature Scaling

Softmax turns arbitrary real-valued scores (logits) into probabilities that sum to one.

#softmax#temperature scaling#logits+12

📚TheoryAdvanced

In-Context Learning Theory

In-context learning (ICL) means a model learns from examples provided in the input itself, without updating its parameters.

#in-context learning#transformer#attention+12

📚TheoryAdvanced

Transformer Expressiveness

Transformer expressiveness studies what kinds of sequence-to-sequence mappings a Transformer can represent or approximate.

#transformer expressiveness#universal approximation#self-attention+12

∑MathIntermediate

Positional Encoding Mathematics

Sinusoidal positional encoding represents each token’s position using pairs of sine and cosine waves at exponentially spaced frequencies.

#positional encoding#sinusoidal#transformer+11

⚙️AlgorithmIntermediate

Efficient Attention Mechanisms

Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.

#linear attention#efficient attention#kernel trick+12

📚TheoryIntermediate

Self-Attention as Graph Neural Network

Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.

#self-attention#graph neural network#message passing+11

📚TheoryIntermediate

Multi-Head Attention

Multi-Head Attention runs several attention mechanisms in parallel so each head can focus on different relationships in the data.

#multi-head attention#scaled dot-product attention#transformer+12

📚TheoryIntermediate

Scaled Dot-Product Attention

Scaled dot-product attention scores how much each value V should contribute to a query by taking dot products with keys K, scaling by \(\sqrt{d_k}\), applying softmax, and forming a weighted sum.

#scaled dot-product attention#softmax#transformer+10