Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.
FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.
Motive is a new way to figure out which training videos teach an AI how to move things realistically, not just how they look.
RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.
InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.
This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.
CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.
This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.
UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.