RIVER Bench is a new test that checks how well AI can watch a video stream and talk with you in real time.
Molmo2 is a family of vision-language models that can watch videos, understand them, and point to or track things over time using fully open weights, data, and code.
This paper builds a new test, LongShOTBench, to check if AI can truly understand long videos by using sight, speech, and sounds together.