🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Search

"safety evaluation"6 resultsKeyword

DeepSight: An All-in-One LM Safety Toolkit

Intermediate
Bo Zhang, Jiaxuan Guo et al.Feb 12arXiv

DeepSight is a free, all-in-one safety toolkit that both tests how models behave (DeepSafe) and peeks inside how they think (DeepScan).

#LLM safety evaluation#multimodal safety#frontier AI risks

Not triaged yet

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Intermediate
Anmol Goel, Cornelius Emde et al.Jan 21arXiv

Benign fine-tuning meant to make language models more helpful can accidentally make them overshare private information.

#contextual privacy#privacy collapse#fine-tuning

Not triaged yet

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Beginner
Xin Wang, Yunhao Chen et al.Jan 4arXiv

OpenRT is a big, open-source test bench that safely stress-tests AI models that handle both text and images.

#OpenRT#red teaming#multimodal LLM

Not triaged yet

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Intermediate
Dasol Choi, DongGeon Lee et al.Jan 5arXiv

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

#policy alignment#allowlist denylist#enterprise AI safety

Not triaged yet

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Intermediate
Zhi Yang, Runguo Li et al.Jan 9arXiv

FinVault is a new test that checks if AI helpers for finance stay safe while actually doing real jobs, not just chatting.

#financial AI agents#execution-grounded benchmarking#sandboxed environments

Not triaged yet

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Beginner
Zhongxi Wang, Yueqian Lin et al.Mar 3arXiv

MUSE is a new open-source platform that tests how safely AI models behave when you talk to them with text, sound, pictures, and video, not just text.

#MUSE#multimodal safety evaluation#red-teaming

Not triaged yet