Papers2

#state tracking

AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts

Shicheng Fang, Yuxin Wang et al.Jan 28arXiv

AgentLongBench is a new test that checks how well AI agents think over very long stories made of their own actions and the world's replies, not just by reading static documents.

#AgentLongBench#long-context agents#environment rollouts

Not triaged yet

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

Intermediate

Jungho Cho, Minbyul Jeong et al.Jan 13arXiv

The paper builds a new way to create realistic, long conversations between people and AI that use tools like databases.

#multi-turn dialogue generation#tool use#user simulation

Not triaged yet