Spatia is a video generator that keeps a live 3D map of the scene (a point cloud) as its memory while making videos.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
The paper turns one flat picture into a neat stack of see‑through layers, so you can edit one thing without messing up the rest.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
Robots often see the world as flat pictures but must move in a 3D world, which makes accurate actions hard.
Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.
This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.
Omni-Attribute is a new image encoder that learns just the parts of a picture you ask for (like hairstyle or lighting) and ignores the rest.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.
OmniPSD is a new AI that can both make layered Photoshop (PSD) files from words and take apart a flat image into clean, editable layers.
TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.
Saber is a new way to make videos that match a text description while keeping the look of people or objects from reference photos, without needing special triplet datasets.