Papers6

#prompt injection

Agents of Chaos

Natalie Shapira, Chris Wendler et al.Feb 23arXiv

This paper put real AI agents into a safe, live playground and asked expert testers to mess with them to see what breaks.

#AI agents#red teaming#identity verification

Not triaged yet

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Beginner

Tianyu Chen, Dongrui Liu et al.Feb 16arXiv

This paper checks how safe a real, tool-using AI agent called Clawdbot (OpenClaw) is by watching every step it takes during tasks, not just the final answer.

#trajectory-centric safety#tool-using AI agents#prompt injection

Not triaged yet

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Intermediate

Dongrui Liu, Qihan Ren et al.Jan 26arXiv

AgentDoG is a new ‘diagnostic guardrail’ that watches AI agents step-by-step and explains exactly why a risky action happened.

#AgentDoG#AI agent safety#diagnostic guardrail

Not triaged yet

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Beginner

Yi Liu, Weizhe Wang et al.Jan 15arXiv

Agent skills are like apps for AI helpers, but many of them are not carefully checked for safety yet.

#agent skills#AI security#prompt injection

Not triaged yet

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Intermediate

Yutao Mou, Zhangchi Xue et al.Jan 15arXiv

ToolSafe is a new way to keep AI agents safe when they use external tools, by checking each action before it runs.

#step-level safety#tool invocation#LLM agents

Not triaged yet

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Intermediate

Zhi Yang, Runguo Li et al.Jan 9arXiv

FinVault is a new test that checks if AI helpers for finance stay safe while actually doing real jobs, not just chatting.

#financial AI agents#execution-grounded benchmarking#sandboxed environments

Not triaged yet