🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#false positive rate

Bielik Guard: Efficient Polish Language Safety Classifiers for LLM Content Moderation

Intermediate
Krzysztof Wróbel, Jan Maria Kowalski et al.Feb 8arXiv

Bielik Guard is a pair of small but strong Polish language safety models that check text for five kinds of risky content: hate/aggression, vulgar language, sexual content, crime, and self-harm.

#Polish NLP#content moderation#safety classifier

ASA: Training-Free Representation Engineering for Tool-Calling Agents

Intermediate
Youjin Wang, Run Zhou et al.Feb 4arXiv

The paper finds a strange gap: the model’s hidden thoughts almost perfectly show when it should use a tool, but its actual words often don’t trigger the tool under strict rules.

#activation steering#representation engineering#tool calling