This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.
SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.
Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.