This paper builds a gigantic library of video puzzles (VBVR) so AI can practice not just making pretty videos, but actually thinking through what happens over time.
DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.