HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.
The paper teaches a game-playing AI to copy good human players (behavior cloning) and shows that simply scaling up the model and the data makes the AI reason more causally (it pays attention to what truly causes outcomes on screen).
Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.