ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
IntermediateXiaoxuan Wang, Han Zhang et al.Feb 25arXiv
This paper tackles why training AI agents that act over many steps (like browsing the web or moving in a house) often becomes unstable and collapses.
#Agentic Reinforcement Learning#Policy Gradient#Sequence-level Clipping