Category
Level
The policy gradient theorem tells us how to push a stochastic policyβs parameters to increase expected return by following the gradient of expected rewards.