Groups
Category
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.
Label smoothing replaces a hard one-hot target with a slightly softened distribution to reduce model overconfidence.