Groups
Multi-task loss balancing aims to automatically set each task’s weight so that no single loss dominates training.
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.