Papers181

#GRPO

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

Not triaged yet

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner

Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Not triaged yet

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Intermediate

James Burgess, Jan N. Hansen et al.Jan 26arXiv

This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.

#RLVR#search agents#PaperSearchQA

Not triaged yet

Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models

Beginner

Kunat Pipatanakul, Pittawat TaveekitworachaiJan 26arXiv

Typhoon-S is a simple, open recipe that turns a basic language model into a helpful assistant and then teaches it important local skills, all on small budgets.

#Typhoon-S#on-policy distillation#full-logits distillation

Not triaged yet

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation

Intermediate

Chenyu Mu, Xin He et al.Jan 25arXiv

This paper teaches AI to turn simple dialogue into full movie scenes by first writing a detailed script and then filming it step by step.

#dialogue-to-video#cinematic script generation#ScripterAgent

Not triaged yet

SAMTok: Representing Any Mask with Two Words

Intermediate

Yikang Zhou, Tao Zhang et al.Jan 22arXiv

SAMTok turns any object’s mask in an image into just two special “words” so language models can handle pixels like they handle text.

#SAMTok#mask tokenizer#residual vector quantization

Not triaged yet

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner

Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion

Not triaged yet

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Beginner

Zhiwei Zhang, Fei Zhao et al.Jan 22arXiv

Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.

#FISSION-GRPO#error recovery#tool use

Not triaged yet

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner

Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

Not triaged yet

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Intermediate

Matthew Y. R. Yang, Hao Bai et al.Jan 20arXiv

The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.

#Intervention Training#credit assignment#LLM reasoning

Not triaged yet

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Intermediate

Shengda Fan, Xuyan Ye et al.Jan 20arXiv

DARC teaches big language models to get smarter by splitting training into two calm, well-organized steps instead of one chaotic loop.

#DARC#self-play#curriculum learning

Not triaged yet

Think3D: Thinking with Space for Spatial Reasoning

Beginner

Zaibin Zhang, Yuhan Wu et al.Jan 19arXiv

Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.

#Think3D#spatial reasoning#3D reconstruction

Not triaged yet

6 7 8 9 10