Papers4

#video understanding

Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.

Not triaged yet

Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.

Not triaged yet

The paper tackles how AI agents can truly research the open web when the answers are hidden inside long, messy videos, not just text.

Not triaged yet

TimeLens studies how to teach AI not just what happens in a video, but exactly when it happens, which is called video temporal grounding (VTG).

Not triaged yet