Groups
Category
The Evidence Lower Bound (ELBO) is a tractable lower bound on the log evidence log p(x) that enables learning and inference in latent variable models like VAEs.
The policy gradient theorem tells us how to push a stochastic policyโs parameters to increase expected return by following the gradient of expected rewards.