Papers10

All Beginner Intermediate Advanced

All Sources arXiv

#LLM agents

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Beginner

Zhenting Wang, Huancheng Chen et al.Mar 4arXiv

This paper teaches long-horizon AI agents to remember everything exactly without stuffing their whole memory at once.

#indexed memory#LLM agents#long-horizon tasks

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

Beginner

Emre Can Acikgoz, Cheng Qian et al.Feb 24arXiv

Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.

#self-play reinforcement learning#tool calling#function calling

Learning Personalized Agents from Human Feedback

Beginner

Kaiqu Liang, Julia Kruk et al.Feb 18arXiv

AI helpers often don’t know new users’ tastes and can’t keep up when those tastes change.

#personalization#human feedback#pre-action clarification

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

Beginner

Xianyang Liu, Shangding Gu et al.Feb 5arXiv

AgenticPay is a safe playground where AI agents practice buying and selling by talking, not just by typing numbers.

#multi-agent negotiation#language-mediated bargaining#LLM agents

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner

Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Beginner

Zhihan Liu, Lin Guan et al.Jan 26arXiv

LLM agents are usually trained in a few worlds but asked to work in many different, unseen worlds, which often hurts their performance.

#cross-domain generalization#state information richness#planning complexity

Agentic Confidence Calibration

Beginner

Jiaxin Zhang, Caiming Xiong et al.Jan 22arXiv

AI agents often act very sure of themselves even when they are wrong, especially on long, multi-step tasks.

#agentic confidence calibration#holistic trajectory calibration#general agent calibrator

The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Beginner

Weihao Xuan, Qingcheng Zeng et al.Jan 12arXiv

This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.

#LLM agents#calibration#overconfidence

MemEvolve: Meta-Evolution of Agent Memory Systems

Beginner

Guibin Zhang, Haotian Ren et al.Dec 21arXiv

MemEvolve teaches AI agents not only to remember past experiences but also to improve the way they remember, like a student who upgrades their study habits over time.

#LLM agents#agent memory#meta-evolution

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Beginner

Zehua Pei, Hui-Ling Zhen et al.Dec 17arXiv

SCOPE lets AI agents rewrite their own instructions while they are working, so they can fix mistakes and get smarter on the next step, not just the next task.

#prompt evolution#LLM agents#context management