This paper builds UniG2U-Bench, a big test to find out when making pictures (generation) actually helps models understand pictures and text together.
DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.
This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.