Papers10

#tool use

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

Changdae Oh, Seongheon Park et al.Feb 4arXiv

This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.

#uncertainty quantification#LLM agents#interactive AI

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Intermediate

Bowen Xu, Shaoyu Wu et al.Feb 2arXiv

This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.

#task decomposition#tool use#large reasoning models

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Intermediate

Mohan Jiang, Dayuan Fu et al.Feb 2arXiv

Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.

#long-horizon agency#pull request chains#software evolution

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Intermediate

Johannes Kirmayr, Lukas Stappen et al.Jan 29arXiv

CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.

#LLM agents#benchmarking#consistency

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

Intermediate

Jungho Cho, Minbyul Jeong et al.Jan 13arXiv

The paper builds a new way to create realistic, long conversations between people and AI that use tools like databases.

#multi-turn dialogue generation#tool use#user simulation

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Intermediate

Junru Lu, Jiarui Qin et al.Dec 31arXiv

Youtu-LLM is a small (1.96B) language model that was trained from scratch to think, plan, and act like an agent instead of just copying bigger models.

#lightweight LLM#agentic mid-training#trajectory data

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Intermediate

Yong Xien Chng, Tao Hu et al.Dec 30arXiv

SenseNova-MARS is a vision-language model that can think step-by-step and use three tools—text search, image search, and image cropping—during its reasoning.

#multimodal agent#vision-language model#reinforcement learning

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Intermediate

Jiacheng Guo, Ling Yang et al.Dec 22arXiv

GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.

#GenEnv#co-evolutionary learning#difficulty-aligned curriculum

Adaptation of Agentic AI

Intermediate

Pengcheng Jiang, Jiacheng Lin et al.Dec 18arXiv

This paper organizes how AI agents learn and improve into one simple map with four roads: A1, A2, T1, and T2.

#agentic AI#adaptation#A1 A2 T1 T2

Reinventing Clinical Dialogue: Agentic Paradigms for LLM Enabled Healthcare Communication

Intermediate

Xiaoquan Zhi, Hongke Zhao et al.Dec 1arXiv

Clinical conversations are special because they mix caring feelings with precise medical facts, and old AI systems struggled to do both at once.

#clinical dialogue#agentic AI#large language models