The paper solves a big problem in fast image generators: they got quick, but they lost variety and kept making similar pictures.
PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.
FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.
PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.
This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.
TwinBrainVLA is a robot brain with two halves: a frozen generalist that keeps world knowledge safe and a trainable specialist that learns to move precisely.
ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.