This paper introduces GEBench, a new test to check if image generation models can act like real app screens that change when you click or type.
RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.