Papers3

All Beginner Intermediate Advanced

All Sources arXiv

#exploration

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Intermediate

Siting Wang, Xiaofeng Wang et al.Mar 2arXiv

Robots that read images and instructions (VLAs) get stuck following a narrow, fragile path after normal training.

#vision-language-action#flow matching#stochastic differential equations

Not triaged yet

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Intermediate

Zeyuan Liu, Jeonghye Kim et al.Feb 26arXiv

This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.

#EMPO#memory-augmented agents#on-policy learning

Not triaged yet

Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Intermediate

Zhiqi Yu, Zhangquan Chen et al.Feb 5arXiv

The paper finds a hidden symmetry inside GRPO’s advantage calculation that accidentally stops models from exploring new good answers and from paying the right attention to easy versus hard problems at the right times.

#GRPO#GRAE#A-GRAE

Not triaged yet