VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.
SurgWorld teaches surgical robots using videos plus text, then guesses the missing robot moves so we can train good policies without collecting tons of real robot-action data.
Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.
HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.
SemanticGen is a new way to make videos that starts by planning in a small, high-level 'idea space' (semantic space) and then adds the tiny visual details later.
The paper teaches a video generator to move things realistically by borrowing motion knowledge from a strong video tracker.
MetaCanvas lets a multimodal language model (MLLM) sketch a plan inside the generator’s hidden canvas so diffusion models can follow it patch by patch.