Concepts2
πTheoryAdvanced
Policy Gradient Theorem
The policy gradient theorem tells us how to push a stochastic policyβs parameters to increase expected return by following the gradient of expected rewards.
#policy gradient#reinforce#actor-critic+11
πTheoryAdvanced
Reinforcement Learning Theory
Reinforcement Learning (RL) studies how an agent learns to act in an environment to maximize long-term cumulative reward.
#reinforcement learning#mdp#bellman equation+12