Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
IntermediateXiaoqian Shen, Min-Hung Chen et al.Dec 16arXiv
Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.
#Grounded Video Question Answering#Temporal Grounding#Coarse-to-Fine