๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#long video understanding

Active Perception Agent for Omnimodal Audio-Video Understanding

Intermediate
Keda Tao, Wenjie Du et al.Dec 29arXiv

This paper introduces OmniAgent, a smart video-and-audio detective that actively decides when to listen and when to look.

#active perception#omnimodal understanding#audio-guided event localization

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Intermediate
Jitesh Jain, Jialuo Li et al.Dec 15arXiv

SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.

#any-horizon reasoning#video agents#temporal grounding

Rethinking Chain-of-Thought Reasoning for Videos

Intermediate
Yiwu Zhong, Zi-Yuan Hu et al.Dec 10arXiv

The paper shows that video AIs do not need long, human-like chains of thought to reason well.

#video reasoning#chain-of-thought#concise reasoning