DFlash is a new way to make big language models answer much faster without changing the final answers.
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.