DrivingGen is a new, all-in-one test that fairly checks how well AI can imagine future driving videos and motions.
SVBench is the first benchmark that checks whether video generation models can show realistic social behavior, not just pretty pictures.
Visual grounding is when an AI finds the exact thing in a picture that a sentence is talking about, and this paper shows todayβs big vision-language AIs are not as good at it as we thought.
OmniSafeBench-MM is a one-stop, open-source test bench that fairly compares how multimodal AI models get tricked (jailbroken) and how well different defenses stop that.
This paper introduces AV-SpeakerBench, a new test that checks if AI can truly see, hear, and understand who is speaking, what they say, and when they say it in real videos.