VINO: A Unified Visual Generator with Interleaved OmniModal Context
BeginnerJunyi Chen, Tong He et al.Jan 5arXiv
VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.
#VINO#unified visual generator#multimodal diffusion transformer