This paper teaches a new way to make a language model pay extra attention to the exact words you highlight in a prompt.
This paper explains how AI agents remember things across long conversations and why many current tests donβt truly measure that memory.
Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.