Groups
Category
Level
Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).
Softmax turns arbitrary real-valued scores (logits) into probabilities that sum to one.
The Information Bottleneck (IB) principle formalizes learning compact representations T that keep only the information about X that is useful for predicting Y.
The Maximum Entropy Principle picks the probability distribution with the greatest uncertainty (entropy) that still satisfies the facts you know (constraints).
Cross-entropy measures how well a proposed distribution Q predicts outcomes actually generated by a true distribution P.
KL divergence measures how much information is lost when using model Q to approximate the true distribution P.
Variational Inference (VI) replaces an intractable posterior with a simpler distribution and optimizes it by minimizing KL divergence, which is equivalent to maximizing the ELBO.
Mutual Information (MI) measures how much knowing one random variable reduces uncertainty about another.
KullbackโLeibler (KL) divergence measures how one probability distribution P devotes probability mass differently from a reference distribution Q.
Information theory quantifies uncertainty and information using measures like entropy, cross-entropy, KL divergence, and mutual information.