Papers4

#τ-bench

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Jinpeng Chen, Cheng Gong et al.Mar 2arXiv

CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.

#constraint-guided verification#multi-turn tool use#user simulator

Not triaged yet

Towards a Science of AI Agent Reliability

Intermediate

Stephan Rabanser, Sayash Kapoor et al.Feb 18arXiv

Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.

#AI agent reliability#consistency#robustness

Not triaged yet

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Intermediate

Bowen Xu, Shaoyu Wu et al.Feb 2arXiv

This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.

#task decomposition#tool use#large reasoning models

Not triaged yet

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Intermediate

Zhihao Xu, Rumei Li et al.Jan 15arXiv

The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.

#GEM pipeline#text-based trajectory generation#tool-use data synthesis

Not triaged yet