PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.
Before this work, most text-to-image models used VAEs (small, squished image codes) and struggled with slow training and overfitting on high-quality fine-tuning sets.
FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.
FlashPortrait makes talking-portrait videos that keep a person’s identity steady for as long as you want—minutes or even hours.
The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.
The paper teaches a video generator to move things realistically by borrowing motion knowledge from a strong video tracker.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.