Deep search agents can plan and browse the web in many steps, but they often fail because they donβt notice when their own thinking drifts off-track.
AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.