🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Search

"red teaming"3 resultsKeyword

Constitutional Classifiers: Defending against universal jailbreaks

AnthropicFeb 8Anthropic

Not triaged yet

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Beginner
Xin Wang, Yunhao Chen et al.Jan 4arXiv

OpenRT is a big, open-source test bench that safely stress-tests AI models that handle both text and images.

#OpenRT#red teaming#multimodal LLM

Not triaged yet

Building Production-Ready Probes For Gemini

Beginner
János Kramár, Joshua Engels et al.Jan 16arXiv

The paper shows how to build tiny, fast safety checkers (called probes) that look inside a big AI’s brain activity to spot dangerous cyber-attack requests.

#activation probes#misuse mitigation#long-context robustness

Not triaged yet