The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.
This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.
WebOperator is a smart way for AI to use a map of choices (a search tree) to navigate websites safely and reach goals.
Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.