Groups
Category
The policy gradient theorem tells us how to push a stochastic policyโs parameters to increase expected return by following the gradient of expected rewards.
Reinforcement Learning (RL) studies how an agent learns to act in an environment to maximize long-term cumulative reward.