Test-Time Training with KV Binding Is Secretly Linear Attention
IntermediateJunchen Liu, Sven Elflein et al.Feb 24arXiv
The paper shows that Test-Time Training (TTT) with keyβvalue (KV) binding is not really memorizing like a notebook; it is acting like a learned linear attention layer.
#Test-Time Training#KV Binding#Linear Attention