Concepts6

Groups

Mixed Precision Training

Mixed precision training stores and computes tensors in low precision (FP16/BF16) for speed and memory savings while keeping a master copy of weights in FP32 for accurate updates.

#mixed precision#fp16#bf16+10

📚TheoryIntermediate

Multi-Head Attention

Multi-Head Attention runs several attention mechanisms in parallel so each head can focus on different relationships in the data.

#multi-head attention

Concepts6

Mixed Precision Training

Multi-Head Attention

Scaled Dot-Product Attention

Multivariable Chain Rule

Matrix Operations & Properties

Parallel Algorithm Theory