Groups
Category
Stochastic Variational Inference (SVI) scales variational inference to large datasets by taking noisy but unbiased gradient steps using minibatches.
Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.