Papers4

#cross-modal retrieval

MAEB: Massive Audio Embedding Benchmark

Adnan El Assadi, Isaac Chung et al.Feb 17arXiv

MAEB is a giant, fair report card for audio AI that tests 50+ models on 30 tasks across speech, music, environmental sounds, and audio–text tasks in 100+ languages.

#audio embeddings#MAEB#MTEB

Not triaged yet

ResearchGym: Evaluating Language Model Agents on Real-World AI Research

Intermediate

Aniketh Garikaparthi, Manasi Patwardhan et al.Feb 16arXiv

ResearchGym is a new "gym" where AI agents are tested on real research projects end to end, not just on toy problems.

#ResearchGym#closed-loop research#objective evaluation

Not triaged yet

Agentic Very Long Video Understanding

Intermediate

Aniket Rege, Arka Sadhu et al.Jan 26arXiv

The paper tackles understanding super long, first‑person videos (days to a week) by giving an AI a smarter memory and better tools.

#entity scene graph#agentic planning#long-horizon video understanding

Not triaged yet

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Intermediate

Hengyu Shen, Tiancheng Gu et al.Jan 15arXiv

DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.

#DanQing#Chinese vision-language dataset#image-text pairs

Not triaged yet