FOFPred is a new AI that reads one or two images plus a short instruction like โmove the bottle left to right,โ and then predicts how every pixel will move in the next moments.
This paper teaches text-to-video models to follow real-world physics, so people, balls, water, glass, and fire act the way they should.
SurgWorld teaches surgical robots using videos plus text, then guesses the missing robot moves so we can train good policies without collecting tons of real robot-action data.
JavisGPT is a single AI that can both understand sounding videos (audio + video together) and also create new ones that stay in sync.