🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#distribution shift

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Intermediate
Seanie Lee, Sangwoo Park et al.Jan 30arXiv

Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.

#THINKSAFE#self-generated safety alignment#refusal steering

Building Production-Ready Probes For Gemini

Beginner
János Kramár, Joshua Engels et al.Jan 16arXiv

The paper shows how to build tiny, fast safety checkers (called probes) that look inside a big AI’s brain activity to spot dangerous cyber-attack requests.

#activation probes#misuse mitigation#long-context robustness

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Intermediate
Zhiyuan Hu, Yunhai Hu et al.Jan 14arXiv

This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.

#multi-agent systems#test-time reinforcement learning#experience retrieval

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Intermediate
Zhenpeng Su, Leiyu Pan et al.Dec 5arXiv

Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the “safe zone,” causing unstable learning.

#reinforcement learning#PPO-clip#KL penalty