Concepts2

Groups

PPO & Trust Region Methods

Proximal Policy Optimization (PPO) stabilizes policy gradient learning by preventing each update from moving the policy too far from the previous one.

#ppo#trust region#trpo+11

⚙️AlgorithmIntermediate

t-SNE & UMAP

t-SNE and UMAP are nonlinear dimensionality-reduction methods that preserve local neighborhoods to make high-dimensional data visible in 2D or 3D.

#t-sne