Entropy, divergence, and mutual information — measuring uncertainty and information in learning systems.
10 concepts
Shannon entropy quantifies the average uncertainty or information content of a random variable in bits when using base-2 logarithms.
KL divergence measures how much information is lost when using model Q to approximate the true distribution P.
Cross-entropy measures how well a proposed distribution Q predicts outcomes actually generated by a true distribution P.
Mutual Information (MI) measures how much knowing one random variable reduces uncertainty about another.
Rényi entropy generalizes Shannon entropy by measuring uncertainty with a tunable emphasis on common versus rare outcomes.
Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).
The Information Bottleneck (IB) principle formalizes the tradeoff between compressing an input X and preserving information about a target Y using the objective min_{p(t|x)} I(X;T) - \beta I(T;Y).
Rate–distortion theory tells you the minimum number of bits per symbol needed to represent data while keeping average distortion below a target D.