Papers3

#Long Context

Test-Time Training with KV Binding Is Secretly Linear Attention

Junchen Liu, Sven Elflein et al.Feb 24arXiv

The paper shows that Test-Time Training (TTT) with key–value (KV) binding is not really memorizing like a notebook; it is acting like a learned linear attention layer.

#Test-Time Training#KV Binding#Linear Attention

Not triaged yet

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Beginner

Zihao Huang, Jundong Zhou et al.Jan 29arXiv

ConceptMoE teaches a language model to group easy, similar tokens into bigger ideas called concepts, so it spends more brainpower on the hard parts.

#ConceptMoE#Mixture of Experts#Adaptive Compression

Not triaged yet

NVIDIA Nemotron 3: Efficient and Open Intelligence

Intermediate

NVIDIA, : et al.Dec 24arXiv

Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.

#Nemotron 3#Mixture-of-Experts#LatentMoE

Not triaged yet