Papers8

#parallel decoding

Residual Context Diffusion Language Models

Yuezhou Hu, Harman Singh et al.Jan 30arXiv

Diffusion language models (dLLMs) generate several tokens at once but usually throw away lots of helpful clues each step—RCD keeps and reuses those clues.

#diffusion language models#residual context diffusion#soft tokens

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Intermediate

Giulio Corallo, Paolo PapottiJan 13arXiv

This paper introduces PCED, a way to use many documents as separate 'experts' in parallel so an AI can stitch answers together without stuffing everything into one giant prompt.

#Retrieval-Augmented Generation#PCED#contrastive decoding

On the Role of Discreteness in Diffusion LLMs

Intermediate

Ziqi Jin, Bin Wang et al.Dec 27arXiv

The paper asks what a truly good diffusion-based language model should look like and lists five must-have properties.

#diffusion language models#smooth corruption#discrete tokens

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Intermediate

Jiacheng Ye, Shansan Gong et al.Dec 27arXiv

Dream-VL and Dream-VLA use a diffusion language model backbone to understand images, talk about them, and plan actions better than many regular (autoregressive) models.

#diffusion language model#vision-language model#vision-language-action

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Intermediate

Lanxiang Hu, Siqi Kou et al.Dec 16arXiv

Autoregressive (AR) models normally write one token at a time, which is accurate but slow for long answers.

#Jacobi Forcing#Jacobi decoding#consistency distillation

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Intermediate

Jia-Nan Li, Jian Guan et al.Dec 15arXiv

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Intermediate

Tiwei Bie, Maosong Cao et al.Dec 10arXiv

Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.

#diffusion language model#masked diffusion#block diffusion

Learning Unmasking Policies for Diffusion Language Models

Intermediate

Metod Jazbec, Theo X. Olausson et al.Dec 9arXiv

Diffusion language models write by gradually unmasking hidden words, so deciding which blanks to reveal next is a big deal for both speed and accuracy.

#diffusion language models#masked diffusion#unmasking policy