Groups
Category
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.
The Lottery Ticket Hypothesis (LTH) says that inside a large dense neural network there exist small sparse subnetworks that, when trained in isolation from their original initialization, can reach comparable accuracy to the full model.