This paper asks a simple question with big consequences: can todayβs AI models actively explore a new space and build a trustworthy internal map of it?
The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.