This paper teaches long-horizon AI agents to remember everything exactly without stuffing their whole memory at once.
Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.