ArtHOI is a new zero-shot method that makes people and everyday articulated objects (like doors, drawers, and fridges) move together realistically using only a single generated video as guidance.
InfinityStory is a new system that can make very long videos (even hours) where the world stays the same and characters transition smoothly between shots.
NOVA is a new video editor that lets you change a few key frames (sparse control) while it carefully keeps the original motion and background details (dense synthesis).
The paper introduces CMDM, a new way to make computer-generated human motions that feel smooth over time and match the meaning of a text prompt.
This paper introduces GEBench, a new test to check if image generation models can act like real app screens that change when you click or type.
NarraScore turns a video's changing story into a matching soundtrack by using emotion as the bridge.
This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.
This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.
RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.
FlowBlending is a simple way to speed up video diffusion models by smartly choosing when to use a big model and when a small one is enough.
This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.