PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.
OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).
GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.
This paper makes diffusion-based video super-resolution (VSR) practical for live, low-latency use by removing the need for future frames and cutting denoising from ~50 steps down to just 4.
Deepfakes are getting so good that simple yes/no detectors are failing, especially when attackers add tiny, invisible changes.
This paper checks if a popular text-to-image model called Nano Banana Pro can fix messy photos without any extra training.
SS4D is a new AI model that turns a short single-camera video into a full 3D object that moves over time (that’s 4D), and it does this in about 2 minutes.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
DuetSVG is a new AI that learns to make SVG graphics by generating an image and the matching SVG code together, like sketching first and then tracing neatly.
SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.
Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.