This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.
OmniSafeBench-MM is a one-stop, open-source test bench that fairly compares how multimodal AI models get tricked (jailbroken) and how well different defenses stop that.