This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.
Recurrent Neural Networks (RNNs) are special neural networks that learn from sequences, like sentences or time series, by remembering what came before.
The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.