FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.
The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.
This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.
This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.
SurgWorld teaches surgical robots using videos plus text, then guesses the missing robot moves so we can train good policies without collecting tons of real robot-action data.
Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.
HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.
SemanticGen is a new way to make videos that starts by planning in a small, high-level 'idea space' (semantic space) and then adds the tiny visual details later.