Papers3

#EAGLE-3

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

Alexander Samarin, Sergei Krutikov et al.Feb 27arXiv

Speculative decoding speeds up big language models by letting a small helper model guess several next words and having the big model check them all at once.

#speculative decoding#acceptance rate#LK losses

Not triaged yet

DFlash: Block Diffusion for Flash Speculative Decoding

Intermediate

Jian Chen, Yesheng Liang et al.Feb 5arXiv

DFlash is a new way to make big language models answer much faster without changing the final answers.

#DFlash#speculative decoding#diffusion language model

Not triaged yet

DEER: Draft with Diffusion, Verify with Autoregressive Models

Intermediate

Zicong Cheng, Guo-Wei Yang et al.Dec 17arXiv

DEER is a new way to speed up big language models by letting a diffusion model draft many tokens at once and an autoregressive model double-check them.

#DEER#speculative decoding#diffusion LLM

Not triaged yet