Search

"video editing"17 resultsKeyword

SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

Guibin Chen, Dixuan Lin et al.Feb 25arXiv

SkyReels-V4 is a single, unified model that makes videos and matching sounds together, while also letting you fix or change parts of a video.

#multimodal diffusion transformer#video-audio generation#inpainting

Not triaged yet

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Intermediate

Zhe Huang, Hao Wen et al.Dec 30arXiv

Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.

#multimodal large language model#video understanding#visual hallucination

Not triaged yet

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Intermediate

Ye Fang, Tong Wu et al.Dec 12arXiv

V-RGBX is a new video editing system that lets you change the true building blocks of a scene—like base color, surface bumps, material, and lighting—rather than just painting over pixels.

#intrinsic video editing#inverse rendering#forward rendering

Not triaged yet

Region-Constraint In-Context Generation for Instructional Video Editing

Intermediate

Zhongwei Zhang, Fuchen Long et al.Dec 19arXiv

ReCo is a new way to edit videos just by telling the computer what to change with words, no extra masks needed.

#instruction-based video editing#in-context generation#region constraint

Not triaged yet

World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

Intermediate

Zhiting Mei, Tenny Yin et al.Dec 5arXiv

This paper teaches video-making AI models to say how sure they are about each tiny part of every frame they create.

#controllable video generation#uncertainty quantification#calibration

Not triaged yet

PISCO: Precise Video Instance Insertion with Sparse Control

Beginner

Xiangbo Gao, Renjie Li et al.Feb 9arXiv

PISCO is a video AI that lets you place a specific object into a real video exactly where and when you want, using just a few keyframes instead of editing every frame.

#video instance insertion#sparse keyframe control#video diffusion

Not triaged yet

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Intermediate

Xu Guo, Fulong Ye et al.Feb 12arXiv

DreamID-Omni is one model that can create, edit, and animate human-centered videos with matching voices, all in sync.

#audio-video generation#diffusion transformer#identity preservation

Not triaged yet

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Intermediate

Han Lin, Xichen Pan et al.Dec 12arXiv

MetaCanvas lets a multimodal language model (MLLM) sketch a plan inside the generator’s hidden canvas so diffusion models can follow it patch by patch.

#MetaCanvas#MLLM#Diffusion Transformer

Not triaged yet

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

Intermediate

Tianlin Pan, Jiayi Dai et al.Mar 3arXiv

NOVA is a new video editor that lets you change a few key frames (sparse control) while it carefully keeps the original motion and background details (dense synthesis).

#video editing#pair-free training#sparse control

Not triaged yet

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

Intermediate

Dohun Lee, Chun-Hao Paul Huang et al.Jan 22arXiv

Memory-V2V teaches video editing AIs to remember what they already changed so new edits stay consistent with old ones.

#multi-turn video editing#video-to-video diffusion#explicit memory

Not triaged yet

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Intermediate

Xijie Huang, Chengming Xu et al.Jan 5arXiv

This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.

#First-Frame Propagation#Video Editing#FFP-300K

Not triaged yet

EasyV2V: A High-quality Instruction-based Video Editing Framework

Intermediate

Jinjie Mai, Chaoyang Wang et al.Dec 18arXiv

EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”

#instruction-based video editing#spatiotemporal mask#text-to-video fine-tuning

Not triaged yet

Unified Video Editing with Temporal Reasoner

Intermediate

Xiangpeng Yang, Ji Xie et al.Dec 8arXiv

VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.

#video editing#diffusion transformer#chain-of-frames

Not triaged yet

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Intermediate

Hoiyeong Jin, Hyojin Jang et al.Dec 19arXiv

InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.

#video object insertion#4D scene geometry#diffusion video generation

Not triaged yet

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Intermediate

Yuanhang Li, Yiren Song et al.Dec 17arXiv

IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.

#video editing#visual effects#diffusion transformer

Not triaged yet

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Intermediate

Yiqi Lin, Guoqiang Liang et al.Mar 2arXiv

Kiwi-Edit is a new video editor that follows your words and also copies looks from a picture you give it.

#reference-guided video editing#instruction-based editing#multimodal large language model

Not triaged yet

ProEdit: Inversion-based Editing From Prompts Done Right

Intermediate

Zhi Ouyang, Dian Zheng et al.Dec 26arXiv

ProEdit is a training-free, plug-and-play method that fixes a common problem in image and video editing: the model clings too hard to the original picture and refuses to change what you asked for.

#ProEdit#inversion-based editing#KV-mix

Not triaged yet