Concepts2

Groups

Reparameterization Trick

The reparameterization trick rewrites a random variable as a deterministic function of noise that does not depend on the parameters, such as z = μ + σ · ε with ε ~ N(0, 1).

#reparameterization trick#pathwise derivative#variational autoencoder+11

📚TheoryAdvanced

Policy Gradient Theorem

The policy gradient theorem tells us how to push a stochastic policy’s parameters to increase expected return by following the gradient of expected rewards.