This paper introduces OmniAgent, a smart video-and-audio detective that actively decides when to listen and when to look.
T2AV-Compass is a new, unified test to fairly grade AI systems that turn text into matching video and audio.
VABench is a new, all-in-one test that checks how well AI makes videos with matching sound and pictures.