How I Study AI - Learn AI Papers & Lectures the Easy Way

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Intermediate

Han Zhao, Jingbo Wang et al.Feb 19arXiv

Robots learn better when they predict short, meaningful summaries of future images instead of drawing every pixel of the future scene.

#world modeling#vision-language-action (VLA)#diffusion policy

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Intermediate

Ji Zhao, Yufei Gu et al.Feb 5arXiv

Big idea: use a small, already-trained model to help a bigger model learn good habits early, so the big one trains faster and ends up smarter.

#Late-to-Early Training#LLM pretraining acceleration#representation alignment

Next-Embedding Prediction Makes Strong Vision Learners

Beginner

Sihan Xu, Ziqiao Ma et al.Dec 18arXiv

This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.

#self-supervised learning#vision transformer#autoregression

Papers3

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Next-Embedding Prediction Makes Strong Vision Learners