Papers7

#Transformer

MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

Ning Ding, Fangcheng Liu et al.Feb 3arXiv

MeKi is a new way to grow a language model’s knowledge by using storage (ROM) instead of extra heavy calculations (FLOPs).

#MeKi#memory-based scaling#token-level experts

Not triaged yet

Deep Delta Learning

Intermediate

Yifan Zhang, Yifeng Liu et al.Jan 1arXiv

Deep Delta Learning (DDL) replaces the usual “add the shortcut” rule in deep networks with a smarter, learnable move that can gently erase old info and write new info along a chosen direction.

#Deep Delta Learning#Delta Operator#Residual connection

Not triaged yet

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Intermediate

Xingwei Qu, Shaowen Wang et al.Dec 31arXiv

Language is lumpy: easy stretches and tricky jumps are mixed together, but old models spend the same effort on every word.

#Dynamic Large Concept Models#semantic boundaries#latent reasoning

Not triaged yet

End-to-End Test-Time Training for Long Context

Intermediate

Arnuv Tandon, Karan Dalal et al.Dec 29arXiv

This paper shows how a language model can keep learning while you use it, so it handles very long inputs without slowing down.

#Test-Time Training#Meta-learning#Long-context language modeling

Not triaged yet

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Intermediate

NVIDIA, : et al.Dec 23arXiv

Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.

#Mixture-of-Experts#Mamba-2#Transformer

Not triaged yet

Bidirectional Normalizing Flow: From Data to Noise and Back

Intermediate

Yiyang Lu, Qiao Sun et al.Dec 11arXiv

Normalizing Flows are models that learn how to turn real images into simple noise and then back again.

#Normalizing Flow#Bidirectional Normalizing Flow#Hidden Alignment

Not triaged yet

Attention Is All You Need

Intermediate

Ashish Vaswani, Noam Shazeer et al.Jun 12arXiv

The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.

#Transformer#Self-Attention#Multi-Head Attention

Not triaged yet