💬LLM & GenAI
🤖
Transformer Architecture
Master the Transformer - the foundational architecture behind GPT, BERT, and modern LLMs
Prerequisites
🌱
Beginner
BeginnerUnderstanding Transformers
What to Learn
- •Self-attention mechanism intuition
- •Query, Key, Value explained
- •Multi-head attention
- •Positional encodings
- •Encoder-decoder structure
Resources
- 📚The Illustrated Transformer (Jay Alammar)
- 📚Attention Is All You Need paper
- 📚3Blue1Brown: Attention explained
🌿
Intermediate
IntermediateTransformer variants and modifications
What to Learn
- •Decoder-only (GPT) vs Encoder-only (BERT)
- •Rotary Position Embeddings (RoPE)
- •Grouped Query Attention (GQA)
- •Flash Attention and efficient attention
- •Layer normalization placement (Pre-LN)
Resources
- 📚GPT-2 and BERT papers
- 📚Llama architecture papers
- 📚Flash Attention paper
🌳
Advanced
AdvancedCutting-edge architecture research
What to Learn
- •Mixture of Experts (MoE)
- •State space alternatives to attention
- •Sparse attention patterns
- •Multi-modal transformers
- •Efficient long-context architectures
Resources
- 📚Mixtral and Switch Transformer papers
- 📚Mamba and RWKV papers
- 📚Latest ICML/NeurIPS transformer papers