Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
IntermediateZeyuan Liu, Jeonghye Kim et al.Feb 26arXiv
This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.
#EMPO#memory-augmented agents#on-policy learning