AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios
BeginnerKaiyuan Chen, Qimin Wu et al.Jan 28arXiv
This paper builds a new test called AgentIF-OneDay that checks if AI helpers can follow everyday instructions the way people actually give them.
#AgentIF-OneDay#instruction following#AI agents