📚TheoryIntermediate
Scaled Dot-Product Attention
Scaled dot-product attention scores how much each value V should contribute to a query by taking dot products with keys K, scaling by \(\sqrt{d_k}\), applying softmax, and forming a weighted sum.
#scaled dot-product attention#softmax#transformer+10