Concepts2

Groups

Depth vs Width Tradeoffs

Depth adds compositional power: stacking layers lets neural networks represent functions with many repeated patterns using far fewer neurons than a single wide layer.

#depth vs width#relu#piecewise linear+12

📚TheoryIntermediate

Attention Mechanism Theory

Attention computes a weighted sum of values V where the weights come from how similar queries Q are to keys K.

#attention