Papers14

#temporal coherence

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

Zihao Huang, Tianqi Liu et al.Mar 4arXiv

ArtHOI is a new zero-shot method that makes people and everyday articulated objects (like doors, drawers, and fridges) move together realistically using only a single generated video as guidance.

#articulated human-object interaction#4D reconstruction#optical flow segmentation

Not triaged yet

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Intermediate

Mohamed Elmoghany, Liangbing Zhao et al.Mar 4arXiv

InfinityStory is a new system that can make very long videos (even hours) where the world stays the same and characters transition smoothly between shots.

#long-form video generation#background consistency#multi-agent planning

Not triaged yet

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

Intermediate

Tianlin Pan, Jiayi Dai et al.Mar 3arXiv

NOVA is a new video editor that lets you change a few key frames (sparse control) while it carefully keeps the original motion and background details (dense synthesis).

#video editing#pair-free training#sparse control

Not triaged yet

Causal Motion Diffusion Models for Autoregressive Motion Generation

Intermediate

Qing Yu, Akihisa Watanabe et al.Feb 26arXiv

The paper introduces CMDM, a new way to make computer-generated human motions that feel smooth over time and match the meaning of a text prompt.

#causal diffusion#autoregressive motion generation#text-to-motion

Not triaged yet

GEBench: Benchmarking Image Generation Models as GUI Environments

Intermediate

Haodong Li, Jingwei Wu et al.Feb 9arXiv

This paper introduces GEBench, a new test to check if image generation models can act like real app screens that change when you click or type.

#GEBench#GE-Score#GUI generation

Not triaged yet

NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control

Beginner

Yufan Wen, Zhaocheng Liu et al.Feb 9arXiv

NarraScore turns a video's changing story into a matching soundtrack by using emotion as the bridge.

#video-to-music generation#affective computing#valence-arousal

Not triaged yet

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Intermediate

Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

Not triaged yet

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Intermediate

Anthony Chen, Naomi Ken Korem et al.Jan 29arXiv

This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.

#video dubbing#audio-visual diffusion#joint generation

Not triaged yet

Self-Refining Video Sampling

Intermediate

Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

Not triaged yet

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Intermediate

Boyang Wang, Haoran Zhang et al.Jan 8arXiv

RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.

#robotic manipulation#video diffusion#multi-view generation

Not triaged yet

FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Intermediate

Jibin Song, Mingi Kwon et al.Dec 31arXiv

FlowBlending is a simple way to speed up video diffusion models by smartly choosing when to use a big model and when a small one is enough.

#FlowBlending#stage-aware sampling#video diffusion

Not triaged yet

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Intermediate

Steven Xiao, Xindi Zhang et al.Dec 25arXiv

This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.

#Knot Forcing#autoregressive video diffusion#temporal knot

Not triaged yet

1 2