CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.
This paper builds a new test called AgentIF-OneDay that checks if AI helpers can follow everyday instructions the way people actually give them.
Text-to-image models draw pretty pictures, but often put things in the wrong places or miss how objects interact.
Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.
SVBench is the first benchmark that checks whether video generation models can show realistic social behavior, not just pretty pictures.