🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
All Topics
💬LLM & GenAI
🤖

Transformer Architecture

Master the Transformer - the foundational architecture behind GPT, BERT, and modern LLMs

Recommended for:🤖LLM Engineer🔬ML Researcher

Prerequisites

→Neural Network Fundamentals→RNNs & Sequence Models
🌱

Beginner

Beginner

Understanding Transformers

What to Learn

  • •Self-attention mechanism intuition
  • •Query, Key, Value explained
  • •Multi-head attention
  • •Positional encodings
  • •Encoder-decoder structure

Resources

  • 📚The Illustrated Transformer (Jay Alammar)
  • 📚Attention Is All You Need paper
  • 📚3Blue1Brown: Attention explained
🌿

Intermediate

Intermediate

Transformer variants and modifications

What to Learn

  • •Decoder-only (GPT) vs Encoder-only (BERT)
  • •Rotary Position Embeddings (RoPE)
  • •Grouped Query Attention (GQA)
  • •Flash Attention and efficient attention
  • •Layer normalization placement (Pre-LN)

Resources

  • 📚GPT-2 and BERT papers
  • 📚Llama architecture papers
  • 📚Flash Attention paper
🌳

Advanced

Advanced

Cutting-edge architecture research

What to Learn

  • •Mixture of Experts (MoE)
  • •State space alternatives to attention
  • •Sparse attention patterns
  • •Multi-modal transformers
  • •Efficient long-context architectures

Resources

  • 📚Mixtral and Switch Transformer papers
  • 📚Mamba and RWKV papers
  • 📚Latest ICML/NeurIPS transformer papers
#transformers#attention#gpt#bert