All Topics
🔬Research
🛡️

AI Safety & Alignment

Understand the challenges of building AI systems that are safe, aligned, and beneficial

🌱

Beginner

Beginner

Safety fundamentals

What to Learn

  • What is AI alignment?
  • Reward hacking and specification gaming
  • Robustness and adversarial examples
  • Interpretability basics
  • Current safety practices

Resources

  • 📚AI Safety Fundamentals course
  • 📚Concrete Problems in AI Safety paper
  • 📚Anthropic research blog
🌿

Intermediate

Intermediate

Technical safety research

What to Learn

  • RLHF and preference learning
  • Constitutional AI
  • Scalable oversight
  • Red teaming and evaluation
  • Interpretability methods

Resources

  • 📚InstructGPT paper
  • 📚Constitutional AI paper
  • 📚Interpretability research (Anthropic)
🌳

Advanced

Advanced

Frontier safety challenges

What to Learn

  • Deceptive alignment concerns
  • Capability control
  • Value learning approaches
  • Governance and policy
  • Long-term AI safety

Resources

  • 📚AI Alignment Forum
  • 📚MIRI research
  • 📚DeepMind safety research