This paper fixes a hidden flaw in a popular image tokenizer (FSQ) with a simple one-line change to its activation function.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.