C-RADIOv4 is a single vision model that learns from several expert models at once and keeps their best skills while staying fast.
DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.
NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.
This paper shows a simple way to turn any strong autoregressive (step-by-step) model into a diffusion vision-language model (parallel, block-by-block) without changing the architecture.
HyperVL is a small but smart model that understands images and text, designed to run fast on phones and tablets.
EMMA is a single AI model that can understand images, write about them, create new images from text, and edit images—all in one unified system.