Groups
Category
Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.
Natural gradient scales the ordinary gradient by the inverse Fisher information matrix to account for the geometry of probability distributions.