Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.
RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'
Skyra is a detective-style AI that spots tiny visual mistakes (artifacts) in videos to tell if they are real or AI-generated, and it explains its decision with times and places in the video.