Groups
Category
Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.
Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.