๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Temporal Grounding

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Intermediate
Jiapeng Shi, Junke Wang et al.Jan 12arXiv

VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.

#Video Large Language Model#Temporal Grounding#Referring Video Object Segmentation

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Intermediate
Xiaoqian Shen, Min-Hung Chen et al.Dec 16arXiv

Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.

#Grounded Video Question Answering#Temporal Grounding#Coarse-to-Fine

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Intermediate
Yifan Li, Yingda Yin et al.Dec 2arXiv

ReVSeg teaches an AI to segment objects in videos by thinking step-by-step instead of guessing everything at once.

#Reasoning Video Object Segmentation#Vision-Language Models#Temporal Grounding