Search

"safety evaluation"6 resultsKeyword

DeepSight is a free, all-in-one safety toolkit that both tests how models behave (DeepSafe) and peeks inside how they think (DeepScan).

Not triaged yet

Benign fine-tuning meant to make language models more helpful can accidentally make them overshare private information.

Not triaged yet

OpenRT is a big, open-source test bench that safely stress-tests AI models that handle both text and images.

Not triaged yet

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

Not triaged yet

FinVault is a new test that checks if AI helpers for finance stay safe while actually doing real jobs, not just chatting.

Not triaged yet

MUSE is a new open-source platform that tests how safely AI models behave when you talk to them with text, sound, pictures, and video, not just text.

Not triaged yet