Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.
This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.
Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the โsafe zone,โ causing unstable learning.