SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer
IntermediateTongcheng Fang, Hanling Zhang et al.Jan 23arXiv
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.
#SALAD#sparse attention#linear attention