This paper teaches a new way to make a language model pay extra attention to the exact words you highlight in a prompt.
Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.