🔬Research
🛡️
AI Safety & Alignment
Understand the challenges of building AI systems that are safe, aligned, and beneficial
Prerequisites
🌱
Beginner
BeginnerSafety fundamentals
What to Learn
- •What is AI alignment?
- •Reward hacking and specification gaming
- •Robustness and adversarial examples
- •Interpretability basics
- •Current safety practices
Resources
- 📚AI Safety Fundamentals course
- 📚Concrete Problems in AI Safety paper
- 📚Anthropic research blog
🌿
Intermediate
IntermediateTechnical safety research
What to Learn
- •RLHF and preference learning
- •Constitutional AI
- •Scalable oversight
- •Red teaming and evaluation
- •Interpretability methods
Resources
- 📚InstructGPT paper
- 📚Constitutional AI paper
- 📚Interpretability research (Anthropic)
🌳
Advanced
AdvancedFrontier safety challenges
What to Learn
- •Deceptive alignment concerns
- •Capability control
- •Value learning approaches
- •Governance and policy
- •Long-term AI safety
Resources
- 📚AI Alignment Forum
- 📚MIRI research
- 📚DeepMind safety research