DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
IntermediateHengyu Shen, Tiancheng Gu et al.Jan 15arXiv
DanQing is a fresh, 100-million-pair Chinese imageβtext dataset collected from 2024β2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.
#DanQing#Chinese vision-language dataset#image-text pairs