Large language model (LLM) post-training has uneven work per GPU because some text sequences are much longer than others.
Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.