SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning
IntermediateJintao Zhang, Kai Jiang et al.Feb 13arXiv
Video generators are slow because attention looks at everything, which takes a lot of time.
#sparse attention#Top-k masking#Top-p masking