PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.
This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.
FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.
This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.
Over++ is a video AI that adds realistic effects like shadows, splashes, dust, and smoke between a foreground and a background without changing the original footage.
EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”
Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.