Groups
Category
Level
Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.