Search

"AI safety"6 resultsKeyword

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Beginner

Xin Wang, Yunhao Chen et al.Jan 4arXiv

OpenRT is a big, open-source test bench that safely stress-tests AI models that handle both text and images.

#OpenRT#red teaming#multimodal LLM

Not triaged yet

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Intermediate

Dasol Choi, DongGeon Lee et al.Jan 5arXiv

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

#policy alignment#allowlist denylist#enterprise AI safety

Not triaged yet

Building Production-Ready Probes For Gemini

Beginner

János Kramár, Joshua Engels et al.Jan 16arXiv

The paper shows how to build tiny, fast safety checkers (called probes) that look inside a big AI’s brain activity to spot dangerous cyber-attack requests.

#activation probes#misuse mitigation#long-context robustness

Not triaged yet

Products

Beginner

AnthropicApr 1Anthropic

Australians use Claude a lot—about four times more per person than you’d expect from population size.

#Anthropic Economic Index#Anthropic AI Usage Index#Claude adoption Australia

Not triaged yet

Announcing the OpenAI Safety Fellowship

Beginner

OpenAI BlogApr 6OpenAI

OpenAI announced a short, mentored Safety Fellowship (Sep 14, 2026 to Feb 5, 2027) to help independent researchers do high-impact work on making AI safer.

#AI safety#AI alignment#robustness

Not triaged yet

Products

Beginner

AnthropicFeb 26Anthropic

Anthropic retired its Claude Opus 3 model but kept it available to paid users and by API request to reduce disruption and support research.

#AI model retirement#model preservation#model deprecation

Not triaged yet