TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
IntermediateJun Zhang, Teng Wang et al.Dec 16arXiv
TimeLens studies how to teach AI not just what happens in a video, but exactly when it happens, which is called video temporal grounding (VTG).
#video temporal grounding#multimodal large language models#benchmark re-annotation