Groups
Category
Generalization bounds explain why deep neural networks can perform well on unseen data despite having many parameters.
Natural gradient scales the ordinary gradient by the inverse Fisher information matrix to account for the geometry of probability distributions.