The paper tackles understanding super long, first‑person videos (days to a week) by giving an AI a smarter memory and better tools.
DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.