RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
BeginnerYinjie Wang, Tianbao Xie et al.Feb 2arXiv
RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).
#reinforcement learning#closed-loop optimization#reward modeling