This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
Language models store ideas along straight-line directions inside their brains (representations), like sliders for “truth” or “ethics.”
OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).
Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.
MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.
Youtu-Agent is a build-and-grow factory for AI agents that cuts manual setup and keeps agents improving over time.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
OmniPSD is a new AI that can both make layered Photoshop (PSD) files from words and take apart a flat image into clean, editable layers.
VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.