JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
IntermediateKai Liu, Jungang Li et al.Dec 28arXiv
JavisGPT is a single AI that can both understand sounding videos (audio + video together) and also create new ones that stay in sync.
#multimodal large language model#audio-video synchronization#SyncFusion