UniT teaches one multimodal model to think in steps with pictures and words, so it can check its own work and fix mistakes as it goes.
DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.