How I Study AI - Learn AI Papers & Lectures the Easy Way

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Intermediate

Hengyu Shen, Tiancheng Gu et al.Jan 15arXiv

DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.

#DanQing#Chinese vision-language dataset#image-text pairs

Papers1

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset