Papers4

#FlashAttention-2

Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.

Not triaged yet

ObjEmbed teaches an AI to understand not just whole pictures, but each object inside them, and to link those objects to the right words.

Not triaged yet

Fast KVzip is a new way to shrink an LLM’s memory (the KV cache) while keeping answers just as accurate.

Not triaged yet

Long texts make standard attention in large language models very slow because it checks every word against every other word.

Not triaged yet