Groups
Category
Natural gradient scales the ordinary gradient by the inverse Fisher information matrix to account for the geometry of probability distributions.
Newton's method uses both the gradient and the Hessian to take steps that aim directly at the local optimum by fitting a quadratic model of the loss around the current point.