Papers1262

daVinci-Dev: Agent-native Mid-training for Software Engineering

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

Not triaged yet

TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

Intermediate

Zhewen Tan, Wenhan Yu et al.Jan 26arXiv

TriPlay-RL is a three-role self-play training loop (attacker, defender, evaluator) that teaches AI models to be safer with almost no manual labels.

#LLM safety alignment#self-play reinforcement learning#adversarial prompt generation

Not triaged yet

TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Intermediate

Elena Bruches, Vadim Alperovich et al.Jan 26arXiv

This paper introduces TAM-Eval, a new way to test how well AI models can create, fix, and update unit tests for real software projects.

#unit test maintenance#LLM for software engineering#reference-free evaluation

Not triaged yet

Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

Intermediate

Haotian Li, Shijun Yang et al.Jan 26arXiv

This paper builds an AI agent that learns new skills while working, like a kid who learns new tricks during recess without a teacher telling them what to do.

#in-situ self-evolution#tool evolution#parallel batch evolution

Not triaged yet

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner

Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Not triaged yet

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Intermediate

James Burgess, Jan N. Hansen et al.Jan 26arXiv

This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.

#RLVR#search agents#PaperSearchQA

Not triaged yet

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Intermediate

Fangyuan Xu, Rujun Han et al.Jan 26arXiv

SAGE is a two-agent system that automatically writes tough, multi-step search questions and checks them by actually trying to solve them.

#deep search#agentic data generation#execution feedback

Not triaged yet

VIBEVOICE-ASR Technical Report

Beginner

Zhiliang Peng, Jianwei Yu et al.Jan 26arXiv

VIBEVOICE-ASR is a single-pass system that listens to up to 60 minutes of audio at once and outputs who spoke, when they spoke, and what they said in one stream.

#long-form ASR#speaker diarization#timestamping

Not triaged yet

Agentic Very Long Video Understanding

Intermediate

Aniket Rege, Arka Sadhu et al.Jan 26arXiv

The paper tackles understanding super long, first‑person videos (days to a week) by giving an AI a smarter memory and better tools.

#entity scene graph#agentic planning#long-horizon video understanding

Not triaged yet

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Intermediate

Zhaopeng Qiu, Shuang Yu et al.Jan 26arXiv

The paper shows how to speed up reinforcement learning (RL) for large language models (LLMs) by making numbers smaller (FP8) without breaking training.

#FP8 quantization#LLM reinforcement learning#KV-cache

Not triaged yet

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Intermediate

Yinger Zhang, Shutong Jiang et al.Jan 26arXiv

DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.

#long-horizon planning#agentic tool use#global constrained optimization

Not triaged yet

Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models

Beginner

Kunat Pipatanakul, Pittawat TaveekitworachaiJan 26arXiv

Typhoon-S is a simple, open recipe that turns a basic language model into a helpful assistant and then teaches it important local skills, all on small budgets.

#Typhoon-S#on-policy distillation#full-logits distillation

Not triaged yet

47 48 49 50 51