Papers160

#reinforcement learning

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.

#native reasoning#cybersecurity LLM#chain-of-thought

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Intermediate

Zichen Wen, Boxue Yang et al.Jan 27arXiv

Innovator-VL is a new multimodal AI model that understands both pictures and text to help solve science problems without needing mountains of special data.

#Innovator-VL#multimodal large language model#scientific reasoning

Towards Pixel-Level VLM Perception via Simple Points Prediction

Intermediate

Tianhui Song, Haoyu Lu et al.Jan 27arXiv

SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.

#SimpleSeg#multimodal large language model#decoder-free segmentation

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate

Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner

Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Endless Terminals: Scaling RL Environments for Terminal Agents

Intermediate

Kanishk Gandhi, Shivam Garg et al.Jan 23arXiv

Endless Terminals is an automatic factory that builds thousands of realistic, checkable computer-terminal tasks so AI agents can practice and improve with reinforcement learning.

#reinforcement learning#PPO#terminal agents

Learning to Discover at Test Time

Intermediate

Mert Yuksekgonul, Daniel Koceja et al.Jan 22arXiv

This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.

#test-time training#reinforcement learning#entropic objective

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner

Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Beginner

Zhiwei Zhang, Fei Zhao et al.Jan 22arXiv

Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.

#FISSION-GRPO#error recovery#tool use

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

Intermediate

Jianshu Zhang, Chengxuan Qian et al.Jan 21arXiv

This paper asks a new question for vision-language models: not just 'What do you see?' but 'How far along is the task right now?'

#progress reasoning#vision-language models#episodic retrieval

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner

Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

4 5 6 7 8