DFlash is a new way to make big language models answer much faster without changing the final answers.
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.
This paper teaches a language model to think along several paths at the same time instead of one step after another.