This paper speeds up image and video generators called diffusion transformers by changing how big their puzzle pieces (patches) are at each step.
Video generators are slow because attention looks at everything, which takes a lot of time.
PISCO is a video AI that lets you place a specific object into a real video exactly where and when you want, using just a few keyframes instead of editing every frame.
Diffusion models make great images and videos but are slow because they usually need many tiny steps.
This paper fixes a big problem in long video generation: tiny mistakes that snowball over time and make the video drift and flicker.
FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.
The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.
This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.