LLaDA-o is a new AI that understands pictures and text and can also make images, all in one model.
This paper shows a simple way to turn any strong autoregressive (step-by-step) model into a diffusion vision-language model (parallel, block-by-block) without changing the architecture.