Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.
Action100M is a gigantic video dataset with about 100 million labeled action moments built automatically from 1.2 million instructional videos.
This paper introduces Web World Models (WWMs), a way to build huge, explorable worlds by putting strict rules in code and letting AI write the fun details.
MMGR is a new benchmark that checks whether AI image and video generators follow real-world rules, not just whether their outputs look pretty.