Large language model (LLM) post-training has uneven work per GPU because some text sequences are much longer than others.
SonicMoE makes Mixture-of-Experts (MoE) models train faster and use less memory by redesigning how data is moved and computed on GPUs.