Concepts2

Groups

Temporal Difference Learning

Temporal Difference (TD) Learning updates value estimates by bootstrapping from the next state's current estimate, enabling fast, online learning.

#temporal difference learning#td(0)#sarsa+12

📚TheoryAdvanced

Policy Gradient Theorem

The policy gradient theorem tells us how to push a stochastic policy’s parameters to increase expected return by following the gradient of expected rewards.

#policy gradient