Papers2

#cross-domain generalization

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Not triaged yet

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

Beginner

Yiyuan Li, Zhen Huang et al.Jan 6arXiv

This paper shows that training a language model with reinforcement learning on just one super well-designed example can boost reasoning across many school subjects, not just math.

#polymath learning#one-shot reinforcement learning#GRPO

Not triaged yet