MeKi is a new way to grow a language model’s knowledge by using storage (ROM) instead of extra heavy calculations (FLOPs).
Deep Delta Learning (DDL) replaces the usual “add the shortcut” rule in deep networks with a smarter, learnable move that can gently erase old info and write new info along a chosen direction.
Language is lumpy: easy stretches and tricky jumps are mixed together, but old models spend the same effort on every word.
This paper shows how a language model can keep learning while you use it, so it handles very long inputs without slowing down.
Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.
Normalizing Flows are models that learn how to turn real images into simple noise and then back again.