RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).
LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.
AI agents often act very sure of themselves even when they are wrong, especially on long, multi-step tasks.
This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.
MemEvolve teaches AI agents not only to remember past experiences but also to improve the way they remember, like a student who upgrades their study habits over time.
SCOPE lets AI agents rewrite their own instructions while they are working, so they can fix mistakes and get smarter on the next step, not just the next task.