How I Study AI - Learn AI Papers & Lectures the Easy Way

Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Intermediate

Zhiqi Yu, Zhangquan Chen et al.Feb 5arXiv

The paper finds a hidden symmetry inside GRPO’s advantage calculation that accidentally stops models from exploring new good answers and from paying the right attention to easy versus hard problems at the right times.

#GRPO#GRAE#A-GRAE

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Intermediate

Xiao Liang, Zhong-Zhi Li et al.Feb 2arXiv

The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.

#divide-and-conquer reasoning#chain-of-thought#reinforcement learning

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Intermediate

Tong Wei, Yijun Yang et al.Dec 15arXiv

GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.

#GTR-Turbo#checkpoint merging#TIES-merging

Papers3

Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training