FoundationMotion is a fully automatic pipeline that turns raw videos into detailed motion data, captions, and quizzes about how things move.
This paper introduces MMSI-Video-Bench, a big, carefully hand-made test to check how well AI understands space and motion in videos.