Groups
Category
A Mixture of Experts (MoE) routes each input to a small subset of specialized models called experts, enabling conditional computation.
The policy gradient theorem tells us how to push a stochastic policyโs parameters to increase expected return by following the gradient of expected rewards.