DFlash is a new way to make big language models answer much faster without changing the final answers.
DEER is a new way to speed up big language models by letting a diffusion model draft many tokens at once and an autoregressive model double-check them.