Papers160

#reinforcement learning

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration

This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.

#search-integrated reasoning#reinforcement learning#credit assignment

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Intermediate

Changze Lv, Jie Zhou et al.Feb 3arXiv

DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.

#DeepResearch#query-specific rubrics#human preference learning

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Intermediate

Shuang Sun, Huatong Song et al.Feb 3arXiv

SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.

#SWE-World#software engineering agents#Docker-free training

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Intermediate

Huatong Song, Lisheng Huang et al.Feb 3arXiv

SWE-Master is a fully open, step-by-step recipe for turning a regular coding model into a strong software-fixing agent that works across many steps, files, and tests.

#SWE-Master#software engineering agent#long-horizon SFT

Self-Hinting Language Models Enhance Reinforcement Learning

Intermediate

Baohao Liao, Hanze Dong et al.Feb 3arXiv

When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.

#reinforcement learning#GRPO#self-hinting

Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning

Intermediate

Jiayao Mai, Bangyan Liao et al.Feb 3arXiv

This paper shows that many hard math and AI problems can be solved with one shared idea called homotopy, where we move from an easy version of a problem to the real one step by step.

#homotopy continuation#predictor-corrector#reinforcement learning

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner

Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Intermediate

Xiao Liang, Zhong-Zhi Li et al.Feb 2arXiv

The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.

#divide-and-conquer reasoning#chain-of-thought#reinforcement learning

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Intermediate

Haozhen Zhang, Quanyu Long et al.Feb 2arXiv

MemSkill turns memory operations for AI agents into learnable skills instead of fixed, hand-made rules.

#memory skills#LLM agents#skill bank

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Intermediate

Mouxiang Chen, Lei Zhang et al.Feb 2arXiv

SWE-Universe is a factory-like system that turns real GitHub pull requests into safe, repeatable coding practice worlds with automatic checkers.

#SWE-Universe#software engineering agents#pull requests

Kimi K2.5: Visual Agentic Intelligence

Beginner

Kimi Team, Tongtong Bai et al.Feb 2arXiv

Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.

#multimodal learning#vision-language models#joint optimization

Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Intermediate

Harold Haodong Chen, Xinxiang Yin et al.Feb 2arXiv

LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.

#latent reasoning#text-to-image generation#autoregressive models

2 3 4 5 6