Papers2

#proximal policy optimization

Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning

Jiayao Mai, Bangyan Liao et al.Feb 3arXiv

This paper shows that many hard math and AI problems can be solved with one shared idea called homotopy, where we move from an easy version of a problem to the real one step by step.

#homotopy continuation#predictor-corrector#reinforcement learning

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Intermediate

Junbo Li, Peng Zhou et al.Dec 18arXiv

Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.

#Turn-PPO#multi-turn reinforcement learning#agentic LLMs