Papers20

#classifier-free guidance

EasyV2V: A High-quality Instruction-based Video Editing Framework

Jinjie Mai, Chaoyang Wang et al.Dec 18arXiv

EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”

#instruction-based video editing#spatiotemporal mask#text-to-video fine-tuning

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Intermediate

Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement

Feedforward 3D Editing via Text-Steerable Image-to-3D

Intermediate

Ziqi Ma, Hongqiao Chen et al.Dec 15arXiv

Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.

#3D editing#image-to-3D#ControlNet

Few-Step Distillation for Text-to-Image Generation: A Practical Guide

Intermediate

Yifan Pu, Yizeng Han et al.Dec 15arXiv

Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.

#text-to-image#diffusion models#few-step generation

DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance

Intermediate

Peiying Zhang, Nanxuan Zhao et al.Dec 11arXiv

DuetSVG is a new AI that learns to make SVG graphics by generating an image and the matching SVG code together, like sketching first and then tracing neatly.

#DuetSVG#multimodal generation#SVG generation

CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models

Intermediate

Tong Zhang, Carlos Hinojosa et al.Dec 11arXiv

Diffusion models sometimes copy training images too closely, which can be a privacy and copyright problem.

#diffusion models#memorization mitigation#latent feature injection

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Intermediate

Ruihang Chu, Yefei He et al.Dec 9arXiv

Wan-Move is a new way to control how things move in AI-generated videos by guiding motion directly inside the model’s hidden features.

#motion-controllable video generation#latent trajectory guidance#point trajectories

Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Intermediate

Gengze Zhou, Chongjian Ge et al.Dec 6arXiv

This paper fixes two big problems in image-making AI that builds pictures step by step: it often practices with perfect answers (teacher forcing) but must perform using its own imperfect guesses later, and the earliest coarse steps are much harder than the later fine steps.

#visual autoregressive modeling#next-scale prediction#exposure bias

1 2