Building Production-Ready Probes For Gemini
BeginnerJános Kramár, Joshua Engels et al.Jan 16arXiv
The paper shows how to build tiny, fast safety checkers (called probes) that look inside a big AI’s brain activity to spot dangerous cyber-attack requests.
#activation probes#misuse mitigation#long-context robustness