Papers2

#vision-language-action (VLA)

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Robots learn better when they predict short, meaningful summaries of future images instead of drawing every pixel of the future scene.

#world modeling#vision-language-action (VLA)#diffusion policy

Future Optical Flow Prediction Improves Robot Control & Video Generation

Intermediate

Kanchana Ranasinghe, Honglu Zhou et al.Jan 15arXiv

FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.

#optical flow#future optical flow prediction#vision-language model