The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.
The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.