Groups
Category
Value function approximation replaces a huge table of values with a small set of parameters that can generalize across similar states.
Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.