Groups
Category
The WakeโSleep algorithm trains a pair of models: a generative model that explains how data are produced and a recognition model that guesses hidden causes from observed data.
Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.