Groups
Category
Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.
Newton's method uses both the gradient and the Hessian to take steps that aim directly at the local optimum by fitting a quadratic model of the loss around the current point.