Groups
An f-divergence measures how different two probability distributions P and Q are by averaging a convex function f of the density ratio p(x)/q(x) under Q.
The Information Bottleneck (IB) principle formalizes learning compact representations T that keep only the information about X that is useful for predicting Y.