๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#speculative decoding

DFlash: Block Diffusion for Flash Speculative Decoding

Intermediate
Jian Chen, Yesheng Liang et al.Feb 5arXiv

DFlash is a new way to make big language models answer much faster without changing the final answers.

#DFlash#speculative decoding#diffusion language model

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Beginner
Wenhao Zeng, Xuteng Zhang et al.Jan 8arXiv

Big reasoning AIs think in many steps, which is slow and costly.

#collaborative inference#initial token entropy#step-level routing

DEER: Draft with Diffusion, Verify with Autoregressive Models

Intermediate
Zicong Cheng, Guo-Wei Yang et al.Dec 17arXiv

DEER is a new way to speed up big language models by letting a diffusion model draft many tokens at once and an autoregressive model double-check them.

#DEER#speculative decoding#diffusion LLM

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Intermediate
Jia-Nan Li, Jian Guan et al.Dec 15arXiv

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Intermediate
Monishwaran Maheswaran, Rishabh Tiwari et al.Dec 4arXiv

ARBITRAGE makes AI solve step-by-step problems faster by only using the big, slow model when it is predicted to truly help.

#speculative decoding#step-level speculative decoding#advantage-aware routing