Papers2

#multimodal pretraining

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Shengbang Tong, David Fan et al.Mar 3arXiv

The paper trains one model from scratch to both read text and see images/videos, instead of starting from a language-only model.

#multimodal pretraining#representation autoencoder#RAE

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Intermediate

Hengyu Shen, Tiancheng Gu et al.Jan 15arXiv

DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.

#DanQing#Chinese vision-language dataset#image-text pairs