This paper studies how AI agents get better while they are working, not just whether they finish the job.
SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.
Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.
Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.
KAGE-Bench is a fast, carefully controlled benchmark that tests how well reinforcement learning (RL) agents trained on pixels handle specific visual changes, like new backgrounds or lighting, without changing the actual game rules.
Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.
The paper turns video avatars from passive puppets into active doers that can plan, act, check their own work, and fix mistakes over many steps.