This paper checks how safe a real, tool-using AI agent called Clawdbot (OpenClaw) is by watching every step it takes during tasks, not just the final answer.
OpenRT is a big, open-source test bench that safely stress-tests AI models that handle both text and images.