Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.