Large language models learn better when we spend more practice time on the right questions at the right moments.
This paper introduces LAMER, a Meta-RL training framework that teaches language agents to explore first and then use what they learned to solve tasks faster.