The paper shows how to build tiny, fast safety checkers (called probes) that look inside a big AI’s brain activity to spot dangerous cyber-attack requests.
The authors built a simple six-agent system to see if today’s AI models could plan, run, and write a research paper mostly on their own.