UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
BeginnerJiehui Huang, Yuechen Zhang et al.Dec 8arXiv
UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.
#multimodal video generation#multi-task learning#dynamic noise scheduling