This paper asks whether generation training benefits more from an encoder’s big-picture meaning (global semantics) or from how features are arranged across space (spatial structure).
This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.
SpaceControl lets you steer a powerful 3D generator with simple shapes you draw, without retraining the model.