The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.