Concepts3

Groups

Value Function Approximation

Value function approximation replaces a huge table of values with a small set of parameters that can generalize across similar states.

#reinforcement learning#value function approximation#linear function approximator+12

⚙️AlgorithmIntermediate

PPO & Trust Region Methods

Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.

Concepts3

Value Function Approximation

PPO & Trust Region Methods

Policy Gradient Theorem