Papers2

#reinforcement learning with verifiable rewards

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Valentin Lacombe, Valentin Quesnel et al.Mar 2arXiv

Reasoning Core is a tool that automatically creates a huge variety of logic and math puzzles, checks every answer with real solvers, and lets you smoothly dial the difficulty up or down.

#procedural data generation#symbolic reasoning#PDDL planning

SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

Beginner

Jinyang Wu, Changpeng Yang et al.Jan 30arXiv

Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.

#Sweet Spot Learning#tiered rewards#reinforcement learning with verifiable rewards