This paper shows that many hard math and AI problems can be solved with one shared idea called homotopy, where we move from an easy version of a problem to the real one step by step.
Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.