The paper shows that Test-Time Training (TTT) with keyβvalue (KV) binding is not really memorizing like a notebook; it is acting like a learned linear attention layer.
Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.