The paper tackles understanding super long, first‑person videos (days to a week) by giving an AI a smarter memory and better tools.
LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.