SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
IntermediateWentao Guo, Mayank Mishra et al.Dec 16arXiv
SonicMoE makes Mixture-of-Experts (MoE) models train faster and use less memory by redesigning how data is moved and computed on GPUs.
#Mixture of Experts#Grouped GEMM#Token Rounding