FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
IntermediateHan Zhao, Jingbo Wang et al.Feb 19arXiv
Robots learn better when they predict short, meaningful summaries of future images instead of drawing every pixel of the future scene.
#world modeling#vision-language-action (VLA)#diffusion policy