๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Temporal Grounding

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Intermediate
Jiapeng Shi, Junke Wang et al.Jan 12arXiv

VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.

#Video Large Language Model#Temporal Grounding#Referring Video Object Segmentation

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Intermediate
Xiaoqian Shen, Min-Hung Chen et al.Dec 16arXiv

Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.

#Grounded Video Question Answering#Temporal Grounding#Coarse-to-Fine

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Beginner
Ziyang Wang, Honglu Zhou et al.Dec 5arXiv

Long Video Understanding (LVU) is hard because the important clues are tiny, far apart in time, and buried in hours of mostly unimportant footage.

#Active Video Perception#Long Video Understanding#Plan-Observe-Reflect

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Intermediate
Yifan Li, Yingda Yin et al.Dec 2arXiv

ReVSeg teaches an AI to segment objects in videos by thinking step-by-step instead of guessing everything at once.

#Reasoning Video Object Segmentation#Vision-Language Models#Temporal Grounding