Efficient Autoregressive Video Diffusion with Dummy Head
IntermediateHang Guo, Zhaoyang Jia et al.Jan 28arXiv
This paper finds that about 1 out of every 4 attention heads in autoregressive video diffusion models mostly looks only at the current frame and almost ignores the past, wasting memory and time.
#autoregressive video diffusion#multi-head self-attention#KV cache compression