RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).
Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.