Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
IntermediateAng Lv, Jin Ma et al.Dec 29arXiv
Mixture-of-Experts (MoE) models use many small specialist networks (experts) and a router to pick which experts handle each token, but the router isnβt explicitly taught what each expert is good at.
#Mixture-of-Experts#expert-router coupling#auxiliary loss