Groups
Category
Temporal Difference (TD) Learning updates value estimates by bootstrapping from the next state's current estimate, enabling fast, online learning.
A Markov Decision Process (MDP) models decision-making in situations where outcomes are partly random and partly under the control of a decision maker.