The paper introduces CHAIN, a hands-on 3D playground that tests if AI can not only see objects but also plan and act under real physics.
This paper introduces GEBench, a new test to check if image generation models can act like real app screens that change when you click or type.