Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.
Large language models get smarter when they get bigger, but storing all those extra weights eats tons of memory.