This paper turns a popular image-guidance trick (Classifier-Free Guidance) into a feedback-control problem, just like keeping a car steady in its lane.
This paper speeds up image and video generators called diffusion transformers by changing how big their puzzle pieces (patches) are at each step.
HY3D-Bench is a complete, open-source “starter kit” for making and studying high-quality 3D objects.
Before this work, most text-to-image models used VAEs (small, squished image codes) and struggled with slow training and overfitting on high-quality fine-tuning sets.
This paper shows how to make powerful image‑generating Transformers run fast on phones without needing the cloud.
This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.