Groups
The discounted return G_t sums all future rewards but down-weights distant rewards by powers of a discount factor γ.
Temporal Difference (TD) Learning updates value estimates by bootstrapping from the next state's current estimate, enabling fast, online learning.