Searching through videos, images, and long documents is powerful but gets very expensive when every tiny piece is stored separately.
The paper tackles how AI agents can truly research the open web when the answers are hidden inside long, messy videos, not just text.
LongVideoAgent is a team of three AIs that work together to answer questions about hourβlong TV episodes without missing small details.