The paper tackles how AI agents can truly research the open web when the answers are hidden inside long, messy videos, not just text.
LongVideoAgent is a team of three AIs that work together to answer questions about hourβlong TV episodes without missing small details.