Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
BeginnerJinghan Li, Yang Jin et al.Dec 24arXiv
This paper introduces NExT-Vid, a way to teach a video model by asking it to guess the next frame of a video while parts of the past are hidden.
#autoregressive video pretraining#masked next-frame prediction#context isolation