This paper builds a giant, automatically made video library called SVG2 that tells who is in a video, what they look like, and how they interact over time.
This paper builds a new test, called MURGAT, to check whether AI models can back up each small fact they say with the right part of a video, audio, or figure.