AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts
IntermediateShicheng Fang, Yuxin Wang et al.Jan 28arXiv
AgentLongBench is a new test that checks how well AI agents think over very long stories made of their own actions and the world's replies, not just by reading static documents.
#AgentLongBench#long-context agents#environment rollouts