Scaling Zero-Shot Reference-to-Video Generation
IntermediateZijian Zhou, Shikun Liu et al.Dec 7arXiv
Saber is a new way to make videos that match a text description while keeping the look of people or objects from reference photos, without needing special triplet datasets.
#reference-to-video generation#zero-shot video synthesis#masked training