Papers6

#reinforcement learning with verifiable rewards

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Valentin Lacombe, Valentin Quesnel et al.Mar 2arXiv

Reasoning Core is a tool that automatically creates a huge variety of logic and math puzzles, checks every answer with real solvers, and lets you smoothly dial the difficulty up or down.

#procedural data generation#symbolic reasoning#PDDL planning

Not triaged yet

SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

Beginner

Jinyang Wu, Changpeng Yang et al.Jan 30arXiv

Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.

#Sweet Spot Learning#tiered rewards#reinforcement learning with verifiable rewards

Not triaged yet

BabyVision: Visual Reasoning Beyond Language

Intermediate

Liang Chen, Weichu Xie et al.Jan 10arXiv

BabyVision is a new test that checks if AI can handle the same basic picture puzzles that young children can do, without leaning on language tricks.

#BabyVision#visual reasoning#multimodal large language models

Not triaged yet

Step-GUI Technical Report

Intermediate

Haolong Yan, Jia Wang et al.Dec 17arXiv

This paper builds Step-GUI, a pair of small-but-strong GUI agent models (4B/8B) that can use phones and computers by looking at screenshots and following instructions.

#GUI automation#multimodal large language models#trajectory-level calibration

Not triaged yet

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Intermediate

Jun Zhang, Teng Wang et al.Dec 16arXiv

TimeLens studies how to teach AI not just what happens in a video, but exactly when it happens, which is called video temporal grounding (VTG).

#video temporal grounding#multimodal large language models#benchmark re-annotation

Not triaged yet

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Intermediate

Lakshya A Agrawal, Shangyin Tan et al.Jul 25arXiv

GEPA is a new way to improve AI prompts by letting the AI read its own work, reflect in plain language on what went wrong, and then rewrite its instructions.

#GEPA#reflective prompt evolution#Pareto frontier

Not triaged yet