Papers161

All Beginner Intermediate Advanced

All Sources arXiv

#reinforcement learning

Reasoning Models Struggle to Control their Chains of Thought

Beginner

Chen Yueh-Han, Robert McCarthy et al.Mar 5arXiv

The paper studies whether AI models can hide or reshape their step-by-step thoughts (chains of thought) on command.

#chain-of-thought#controllability#monitorability

Not triaged yet

KARL: Knowledge Agents via Reinforcement Learning

Beginner

Jonathan D. Chang, Andrew Drozdov et al.Mar 5arXiv

KARL is a smart search helper that learns to look up information step by step and explain answers using the facts it finds.

#grounded reasoning#enterprise search#reinforcement learning

Not triaged yet

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Intermediate

Harman Singh, Xiuyu Li et al.Mar 4arXiv

The paper shows that when a model compares two of its own answers head-to-head, it picks the right one more often than when it judges each answer alone.

#pairwise self-verification#test-time scaling#parallel reasoning

Not triaged yet

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Beginner

Zhenting Wang, Huancheng Chen et al.Mar 4arXiv

This paper teaches long-horizon AI agents to remember everything exactly without stuffing their whole memory at once.

#indexed memory#LLM agents#long-horizon tasks

Not triaged yet

Specificity-aware reinforcement learning for fine-grained open-world classification

Intermediate

Samuele Angheben, Davide Berasi et al.Mar 3arXiv

This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.

#open-world classification#fine-grained recognition#large multimodal models

Not triaged yet

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Intermediate

Jiejun Tan, Zhicheng Dou et al.Mar 3arXiv

MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.

#long-term memory#LLM retrieval#proxy model

Not triaged yet

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

Beginner

Jiachun Li, Shaoping Huang et al.Mar 2arXiv

MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.

#multimodal reasoning#multi-image understanding#real-life benchmark

Not triaged yet

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Intermediate

Jinpeng Chen, Cheng Gong et al.Mar 2arXiv

CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.

#constraint-guided verification#multi-turn tool use#user simulator

Not triaged yet

FireRed-OCR Technical Report

Intermediate

Hao Wu, Haoran Lou et al.Mar 2arXiv

FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.

#FireRed-OCR#structural hallucination#document parsing

Not triaged yet

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate

Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Not triaged yet

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Beginner

Xinyu Zhu, Yihao Feng et al.Mar 1arXiv

CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.

#CHIMERA dataset#synthetic data generation#chain-of-thought

Not triaged yet

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Intermediate

Ibragim Badertdinov, Maksim Nekrashevich et al.Feb 27arXiv

SWE-rebench V2 is a giant, language-agnostic robot pipeline that turns real GitHub pull requests into safe, runnable software tasks for training AI coding agents.

#SWE-rebench V2#software engineering agents#reinforcement learning

Not triaged yet

1 2 3 4 5