Groups
Category
Transformer expressiveness studies what kinds of sequence-to-sequence mappings a Transformer can represent or approximate.
Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.