Autoregressive (AR) models normally write one token at a time, which is accurate but slow for long answers.
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.