Papers3

#multimodal alignment

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang, Yanze Wu et al.Jan 20arXiv

OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).

#spatio-temporal video transfer#identity transfer#style transfer

Not triaged yet

KlingAvatar 2.0 Technical Report

Intermediate

Kling Team, Jialu Chen et al.Dec 15arXiv

KlingAvatar 2.0 is a system that makes long, sharp, lifelike talking-person videos that follow audio, images, and text instructions all at once.

#audio-driven avatar#video diffusion#diffusion transformer

Not triaged yet

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Beginner

Yuran Wang, Bohan Zeng et al.Dec 14arXiv

Scone is a new AI method that makes images from instructions while correctly picking the right subject even when many look similar.

#subject-driven image generation#multi-subject composition#subject distinction

Not triaged yet