Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.
INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (about 12B active per token) trained with large-scale reinforcement learning and it beats many bigger models on math, coding, science, and reasoning tests.
Large language models get smarter when they get bigger, but storing all those extra weights eats tons of memory.
Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.