This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.
DreamActor-M2 is a new way to make a still picture move by copying motion from a video while keeping the character’s look the same.
LingBot-World is an open-source world model that turns video generation into an interactive, real-time simulator.
This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.
VideoMaMa is a model that turns simple black-and-white object masks into soft, precise cutouts (alpha mattes) for every frame of a video.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.
MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.
RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.
FlowBlending is a simple way to speed up video diffusion models by smartly choosing when to use a big model and when a small one is enough.
Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.