Concepts2
📚TheoryAdvanced
Transformer Theory
Transformers map sequences to sequences using layers of self-attention and feed-forward networks wrapped with residual connections and LayerNorm.
#transformer#self-attention#positional encoding+12
📚TheoryIntermediate
Attention Mechanism Theory
Attention computes a weighted sum of values V where the weights come from how similar queries Q are to keys K.
#attention#self-attention#multi-head attention+12