How I Study AI - Learn AI Papers & Lectures the Easy Way

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Beginner

Ziwen Xu, Kewei Xu et al.Mar 3arXiv

Large language models can act unpredictably in sensitive places like schools, hospitals, and customer support, so we need reliable ways to guide how they talk and behave.

#LLM controllability#behavioral granularity#hierarchical evaluation

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Intermediate

Mengru Wang, Zhenqian Xu et al.Feb 4arXiv

Large language models can quietly pick up hidden preferences from training data that looks harmless.

#Data2Behavior#Manipulating Data Features#activation injection

RAPTOR: Ridge-Adaptive Logistic Probes

Intermediate

Ziqi Gao, Yaotian Zhu et al.Jan 29arXiv

RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'

#probing#concept vectors#activation steering

Papers3

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

RAPTOR: Ridge-Adaptive Logistic Probes