Groups
Category
Metric learning is about automatically learning a distance function so that similar items are close and dissimilar items are far in a feature space.
The policy gradient theorem tells us how to push a stochastic policyโs parameters to increase expected return by following the gradient of expected rewards.