The paper shows how to build tiny, fast safety checkers (called probes) that look inside a big AI’s brain activity to spot dangerous cyber-attack requests.
FrankenMotion is a new AI that makes human motion by controlling each body part over time, like a careful puppeteer.
This paper is the first big map of how AI can fix real software problems, not just write short code snippets.
Agent skills are like apps for AI helpers, but many of them are not carefully checked for safety yet.
The paper turns messy character descriptions from stories into neat, executable rules so role‑playing AIs act like the character in each specific scene.
STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.
Traditional supervised fine-tuning (SFT) makes a model copy one answer too exactly, which can cause overfitting to the exact wording instead of the real idea.
The paper introduces Entropy Sentinel, a simple way to watch how accurate an AI is by reading its “uncertainty heartbeat” during generation.
Ministral 3 is a new family of small-but-mighty AI language models (3B, 8B, 14B) that learn from a larger model using a step-by-step tutoring method called Cascade Distillation.
Giving large language models a few good examples and step-by-step instructions can make them much better at spotting feelings in text.
The paper introduces Trainee-Bench, a new way to test AI agents that feels like a real first day at work, with tasks arriving over time, hidden clues, and changing priorities.
This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.