Groups
Value function approximation replaces a huge table of values with a small set of parameters that can generalize across similar states.
The policy gradient theorem tells us how to push a stochastic policyโs parameters to increase expected return by following the gradient of expected rewards.