ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
BeginnerQiang Zhang, Boli Chen et al.Jan 10arXiv
ArenaRL teaches AI agents by comparing their answers against each other, like a sports tournament, instead of giving each answer a single noisy score.
#ArenaRL#reinforcement learning#relative ranking