Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models
IntermediateYuanyang Yin, Yufan Deng et al.Jan 12arXiv
Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.
#Image-to-Video generation#Diffusion Transformer#Controllability