Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
BeginnerZiyang Wang, Honglu Zhou et al.Dec 5arXiv
Long Video Understanding (LVU) is hard because the important clues are tiny, far apart in time, and buried in hours of mostly unimportant footage.
#Active Video Perception#Long Video Understanding#Plan-Observe-Reflect