šŸŽ“How I Study AIHISA
šŸ“–Read
šŸ“„PapersšŸ“°BlogsšŸŽ¬Courses
šŸ’”Learn
šŸ›¤ļøPathsšŸ“šTopicsšŸ’”ConceptsšŸŽ“Shorts
šŸŽÆPractice
šŸ“Daily LogšŸŽÆPrompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts2

Groups

šŸ“Linear Algebra15šŸ“ˆCalculus & Differentiation10šŸŽÆOptimization14šŸŽ²Probability Theory12šŸ“ŠStatistics for ML9šŸ“”Information Theory10šŸ”ŗConvex Optimization7šŸ”¢Numerical Methods6šŸ•øGraph Theory for Deep Learning6šŸ”µTopology for ML5🌐Differential Geometry6āˆžMeasure Theory & Functional Analysis6šŸŽ°Random Matrix Theory5🌊Fourier Analysis & Signal Processing9šŸŽ°Sampling & Monte Carlo Methods10🧠Deep Learning Theory12šŸ›”ļøRegularization Theory11šŸ‘ļøAttention & Transformer Theory10šŸŽØGenerative Model Theory11šŸ”®Representation Learning10šŸŽ®Reinforcement Learning Mathematics9šŸ”„Variational Methods8šŸ“‰Loss Functions & Objectives10ā±ļøSequence & Temporal Models8šŸ’ŽGeometric Deep Learning8

Category

šŸ”·Allāˆ‘Mathāš™ļøAlgošŸ—‚ļøDSšŸ“šTheory

Level

AllBeginner
āš™ļøAlgorithmIntermediate

Efficient Attention Mechanisms

Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.

#linear attention#efficient attention#kernel trick+12
šŸ“šTheoryIntermediate

Scaled Dot-Product Attention

Scaled dot-product attention scores how much each value V should contribute to a query by taking dot products with keys K, scaling by \(\sqrt{d_k}\), applying softmax, and forming a weighted sum.

#scaled dot-product attention
Intermediate
Advanced
Filtering by:
#time complexity
Group:
Attention & Transformer Theory
#softmax
#transformer
+10