Groups
Category
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.
Dropout can be interpreted as variational inference in a Bayesian neural network, where applying random masks approximates sampling from a posterior over weights.