Active Perception Agent for Omnimodal Audio-Video Understanding
IntermediateKeda Tao, Wenjie Du et al.Dec 29arXiv
This paper introduces OmniAgent, a smart video-and-audio detective that actively decides when to listen and when to look.
#active perception#omnimodal understanding#audio-guided event localization