Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.
Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.