🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#video editing

MoCha:End-to-End Video Character Replacement without Structural Guidance

Intermediate
Zhengbo Xu, Jie Ma et al.Jan 13arXiv

MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.

#video diffusion#character replacement#in-context learning

VINO: A Unified Visual Generator with Interleaved OmniModal Context

Beginner
Junyi Chen, Tong He et al.Jan 5arXiv

VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.

#VINO#unified visual generator#multimodal diffusion transformer

ProEdit: Inversion-based Editing From Prompts Done Right

Intermediate
Zhi Ouyang, Dian Zheng et al.Dec 26arXiv

ProEdit is a training-free, plug-and-play method that fixes a common problem in image and video editing: the model clings too hard to the original picture and refuses to change what you asked for.

#ProEdit#inversion-based editing#KV-mix

Kling-Omni Technical Report

Intermediate
Kling Team, Jialu Chen et al.Dec 18arXiv

Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.

#multimodal visual language#MVL#prompt enhancer

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Intermediate
Yuanhang Li, Yiren Song et al.Dec 17arXiv

IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.

#video editing#visual effects#diffusion transformer

Unified Video Editing with Temporal Reasoner

Intermediate
Xiangpeng Yang, Ji Xie et al.Dec 8arXiv

VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.

#video editing#diffusion transformer#chain-of-frames