🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts13

Groups

📐Linear Algebra15📈Calculus & Differentiation10🎯Optimization14🎲Probability Theory12📊Statistics for ML9📡Information Theory10🔺Convex Optimization7🔢Numerical Methods6🕸Graph Theory for Deep Learning6🔵Topology for ML5🌐Differential Geometry6∞Measure Theory & Functional Analysis6🎰Random Matrix Theory5🌊Fourier Analysis & Signal Processing9🎰Sampling & Monte Carlo Methods10🧠Deep Learning Theory12🛡️Regularization Theory11👁️Attention & Transformer Theory10🎨Generative Model Theory11🔮Representation Learning10🎮Reinforcement Learning Mathematics9🔄Variational Methods8📉Loss Functions & Objectives10⏱️Sequence & Temporal Models8💎Geometric Deep Learning8

Category

🔷All∑Math⚙️Algo🗂️DS📚Theory

Level

AllBeginnerIntermediate
📚TheoryIntermediate

Sequence-to-Sequence with Attention

Sequence-to-sequence with attention lets a decoder focus on the most relevant parts of the input at each output step, rather than compressing everything into a single vector.

#sequence-to-sequence#attention#encoder-decoder+12
📚TheoryIntermediate

Recurrent Neural Network Theory

A Recurrent Neural Network (RNN) processes sequences by carrying a hidden state that is updated at every time step using h_t = f(W_h h_{t-1} + W_x x_t + b).

12
Advanced
Filtering by:
#softmax
#recurrent neural network
#rnn
#backpropagation through time
+12
📚TheoryIntermediate

Knowledge Distillation Loss

Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.

#knowledge distillation#kd loss#temperature scaling+12
📚TheoryIntermediate

Focal Loss

Focal Loss reshapes cross-entropy so that hard, misclassified examples get more focus while easy, well-classified ones are down-weighted.

#focal loss#class imbalance#cross-entropy+11
📚TheoryIntermediate

Key-Value Memory Systems

Key-Value memory systems store information as pairs where keys are used to look up values by similarity rather than exact match.

#key-value memory#attention#scaled dot-product+12
📚TheoryIntermediate

Self-Attention as Graph Neural Network

Self-attention can be viewed as message passing on a fully connected graph where each token (node) sends a weighted message to every other token.

#self-attention#graph neural network#message passing+11
📚TheoryIntermediate

Multi-Head Attention

Multi-Head Attention runs several attention mechanisms in parallel so each head can focus on different relationships in the data.

#multi-head attention#scaled dot-product attention#transformer+12
📚TheoryIntermediate

Scaled Dot-Product Attention

Scaled dot-product attention scores how much each value V should contribute to a query by taking dot products with keys K, scaling by \(\sqrt{d_k}\), applying softmax, and forming a weighted sum.

#scaled dot-product attention#softmax#transformer+10
📚TheoryIntermediate

Label Smoothing

Label smoothing replaces a hard one-hot target with a slightly softened distribution to reduce model overconfidence.

#label smoothing#cross-entropy#softmax+12
📚TheoryIntermediate

Positional Encoding Theory

Transformers are permutation-invariant by default, so they need positional encodings to understand word order in sequences.

#positional encoding#sinusoidal encoding#transformer+11
📚TheoryIntermediate

Message Passing Framework

Message Passing Neural Networks (MPNNs) learn on graphs by letting nodes repeatedly exchange and aggregate messages from their neighbors.

#message passing neural network#mpnn#graph neural network+12
📚TheoryIntermediate

Cross-Entropy

Cross-entropy measures how well a proposed distribution Q predicts outcomes actually generated by a true distribution P.

#cross-entropy#entropy#kl divergence+12