Groups
Category
A sparse matrix stores only its nonzero entries, saving huge amounts of memory when most entries are zero.
Natural gradient scales the ordinary gradient by the inverse Fisher information matrix to account for the geometry of probability distributions.