The paper asks AI to hunt for insights in big databases without being told exact questions, like a curious scientist instead of a test-taker.
AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.
This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.