The paper studies Mamba-2 (a fast, linear-time attention method) and pares it down to the pieces that truly boost accuracy.
GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.