Papers127

#reinforcement learning

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Xiaoyu Tian, Haotian Wang et al.Jan 29arXiv

ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.

#tool-augmented agents#multi-turn decision making#verifiable environments

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Intermediate

Yaorui Shi, Shugui Liu et al.Jan 29arXiv

MemOCR is a new way for AI to remember long histories by turning important notes into a picture with big, bold parts for key facts and tiny parts for details.

#MemOCR#visual memory#adaptive information density

Self-Improving Pretraining: using post-trained models to pretrain better models

Intermediate

Ellen Xiaoqing Tan, Shehzaad Dhuliawala et al.Jan 29arXiv

This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.

#self-improving pretraining#reinforcement learning#online DPO

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

Beginner

Zhuoran Yang, Ed Li et al.Jan 28arXiv

This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.

#native reasoning#cybersecurity LLM#chain-of-thought

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Intermediate

Zichen Wen, Boxue Yang et al.Jan 27arXiv

Innovator-VL is a new multimodal AI model that understands both pictures and text to help solve science problems without needing mountains of special data.

#Innovator-VL#multimodal large language model#scientific reasoning

Towards Pixel-Level VLM Perception via Simple Points Prediction

Intermediate

Tianhui Song, Haoyu Lu et al.Jan 27arXiv

SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.

#SimpleSeg#multimodal large language model#decoder-free segmentation

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate

Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner

Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Endless Terminals: Scaling RL Environments for Terminal Agents

Intermediate

Kanishk Gandhi, Shivam Garg et al.Jan 23arXiv

Endless Terminals is an automatic factory that builds thousands of realistic, checkable computer-terminal tasks so AI agents can practice and improve with reinforcement learning.

#reinforcement learning#PPO#terminal agents

Learning to Discover at Test Time

Intermediate

Mert Yuksekgonul, Daniel Koceja et al.Jan 22arXiv

This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.

#test-time training#reinforcement learning#entropic objective

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner

Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion

1 2 3 4 5