šŸŽ“How I Study AIHISA
šŸ“–Read
šŸ“„PapersšŸ“°BlogsšŸŽ¬Courses
šŸ’”Learn
šŸ›¤ļøPathsšŸ“šTopicsšŸ’”ConceptsšŸŽ“Shorts
šŸŽÆPractice
šŸ“Daily LogšŸŽÆPrompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts2

Groups

šŸ“Linear Algebra15šŸ“ˆCalculus & Differentiation10šŸŽÆOptimization14šŸŽ²Probability Theory12šŸ“ŠStatistics for ML9šŸ“”Information Theory10šŸ”ŗConvex Optimization7šŸ”¢Numerical Methods6šŸ•øGraph Theory for Deep Learning6šŸ”µTopology for ML5🌐Differential Geometry6āˆžMeasure Theory & Functional Analysis6šŸŽ°Random Matrix Theory5🌊Fourier Analysis & Signal Processing9šŸŽ°Sampling & Monte Carlo Methods10🧠Deep Learning Theory12šŸ›”ļøRegularization Theory11šŸ‘ļøAttention & Transformer Theory10šŸŽØGenerative Model Theory11šŸ”®Representation Learning10šŸŽ®Reinforcement Learning Mathematics9šŸ”„Variational Methods8šŸ“‰Loss Functions & Objectives10ā±ļøSequence & Temporal Models8šŸ’ŽGeometric Deep Learning8

Category

šŸ”·Allāˆ‘Mathāš™ļøAlgošŸ—‚ļøDSšŸ“šTheory

Level

AllBeginnerIntermediate
āš™ļøAlgorithmIntermediate

Efficient Attention Mechanisms

Standard softmax attention costs O(n²) in sequence length because every token compares with every other token.

#linear attention#efficient attention#kernel trick+12
šŸ“šTheoryIntermediate

Universal Approximation Theorem

The Universal Approximation Theorem (UAT) says a feedforward neural network with one hidden layer and a non-polynomial activation (like sigmoid or ReLU) can approximate any continuous function on a compact set as closely as we want.

Advanced
Filtering by:
#random features
#universal approximation theorem
#cybenko
#hornik
+12