Groups
Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.
The Universal Approximation Theorems say that a neural network with at least one hidden layer and a suitable activation can approximate any continuous function on a compact domain as closely as you like.