💬LLM & GenAI

🤖

Transformer Architecture

Master the Transformer - the foundational architecture behind GPT, BERT, and modern LLMs

Recommended for:🤖LLM Engineer 🔬ML Researcher

Prerequisites

→Neural Network Fundamentals →RNNs & Sequence Models

🌱

Beginner

Beginner

Understanding Transformers

What to Learn

•Self-attention mechanism intuition
•Query, Key, Value explained
•Multi-head attention
•Positional encodings
•Encoder-decoder structure

Resources

📚The Illustrated Transformer (Jay Alammar)
📚Attention Is All You Need paper
📚3Blue1Brown: Attention explained

🌿

Intermediate

Intermediate

Transformer variants and modifications

What to Learn

•Decoder-only (GPT) vs Encoder-only (BERT)
•Rotary Position Embeddings (RoPE)
•Grouped Query Attention (GQA)
•Flash Attention and efficient attention
•Layer normalization placement (Pre-LN)

Resources

📚GPT-2 and BERT papers
📚Llama architecture papers
📚Flash Attention paper

🌳

Advanced

Advanced

Cutting-edge architecture research

What to Learn

•Mixture of Experts (MoE)
•State space alternatives to attention
•Sparse attention patterns
•Multi-modal transformers
•Efficient long-context architectures

Resources

📚Mixtral and Switch Transformer papers
📚Mamba and RWKV papers
📚Latest ICML/NeurIPS transformer papers