DeepSight is a free, all-in-one safety toolkit that both tests how models behave (DeepSafe) and peeks inside how they think (DeepScan).
This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.
OmniSafeBench-MM is a one-stop, open-source test bench that fairly compares how multimodal AI models get tricked (jailbroken) and how well different defenses stop that.