Standard attention is slow for long texts because it compares every word with every other word, which takes quadratic time.
GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.