Groups
Category
A Mixture of Experts (MoE) routes each input to a small subset of specialized models called experts, enabling conditional computation.
Dropout randomly turns off (zeros) some neurons during training to prevent the network from memorizing the training data.