Papers5

All Beginner Intermediate Advanced

All Sources arXiv

#LLM judge

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Intermediate

Aradhye Agarwal, Gurdit Siyan et al.Mar 3arXiv

Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.

#MOSAIC#agentic safety#plan-check-act

Not triaged yet

Steering LLMs via Scalable Interactive Oversight

Intermediate

Enyu Zhou, Zhiheng Xi et al.Feb 4arXiv

The paper tackles a common problem: people can ask AI to do big, complex tasks, but they can’t always explain exactly what they want or check the results well.

#scalable oversight#interactive alignment#requirement elicitation

Not triaged yet

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Intermediate

Zhiyuan Hu, Yucheng Wang et al.Jan 13arXiv

The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.

#Uniqueness-Aware Reinforcement Learning#LLM reasoning#strategy clustering

Not triaged yet

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Intermediate

Jiajie Zhang, Xin Lv et al.Jan 9arXiv

The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.

#deep search agents#reinforcement learning#rubric rewards

Not triaged yet

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Intermediate

Dasol Choi, DongGeon Lee et al.Jan 5arXiv

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

#policy alignment#allowlist denylist#enterprise AI safety

Not triaged yet