Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
Robots often see the world as flat pictures but must move in a 3D world, which makes accurate actions hard.
Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.
This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.
Omni-Attribute is a new image encoder that learns just the parts of a picture you ask for (like hairstyle or lighting) and ignores the rest.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.
OmniPSD is a new AI that can both make layered Photoshop (PSD) files from words and take apart a flat image into clean, editable layers.
TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.
Saber is a new way to make videos that match a text description while keeping the look of people or objects from reference photos, without needing special triplet datasets.
This paper teaches a computer to turn one single picture into a moving 3D scene that stays consistent from every camera angle.
EMMA is a single AI model that can understand images, write about them, create new images from text, and edit images—all in one unified system.
TwinFlow is a new way to make big image models draw great pictures in just one step instead of 40–100 steps.