FASA is a training-free method that makes large language models faster and lighter on memory by keeping only the most useful past tokens during decoding.
This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).
OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).
K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.
The paper introduces Canon layers, tiny add-ons that let nearby words share information directly, like passing notes along a row of desks.
Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.
GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.