The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
BeginnerDaocheng Fu, Jianbiao Mei et al.Jan 13arXiv
The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.
#Trainee-Bench#dynamic task scheduling#active exploration