Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.
MiniCPM-SALA is a 9B-parameter language model that mixes two kinds of attention—sparse and linear—to read very long texts quickly and accurately.
The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.
Locas is a new kind of add-on memory for language models that learns during use but touches none of the model’s original weights.
This paper introduces MOSS Transcribe Diarize, a single model that writes down what people say in a conversation, tells who said each part, and marks the exact times—all in one go.
Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.
Standard attention is slow for long texts because it compares every word with every other word, which takes quadratic time.