Groups
Category
Stochastic Depth randomly drops whole residual layers during training while keeping the full network at inference time.
Spectral regularization controls how much a weight matrix can stretch inputs by constraining its largest singular value (spectral norm).