The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.
Multi-agent AI teams are not automatically better; their success depends on matching the teamβs coordination style to the jobβs structure.