Concepts2

Groups

PPO & Trust Region Methods

Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.

#ppo#trust region#trpo+11

📚TheoryAdvanced

Policy Gradient Theorem

The policy gradient theorem tells us how to push a stochastic policy’s parameters to increase expected return by following the gradient of expected rewards.

#policy gradient