Groups
The discounted return G_t sums all future rewards but down-weights distant rewards by powers of a discount factor γ.
A Markov Decision Process (MDP) models decision-making in situations where outcomes are partly random and partly under the control of a decision maker.