๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#representation steering

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Beginner
Ziwen Xu, Kewei Xu et al.Mar 3arXiv

Large language models can act unpredictably in sensitive places like schools, hospitals, and customer support, so we need reliable ways to guide how they talk and behave.

#LLM controllability#behavioral granularity#hierarchical evaluation

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Intermediate
Mengru Wang, Zhenqian Xu et al.Feb 4arXiv

Large language models can quietly pick up hidden preferences from training data that looks harmless.

#Data2Behavior#Manipulating Data Features#activation injection

RAPTOR: Ridge-Adaptive Logistic Probes

Intermediate
Ziqi Gao, Yaotian Zhu et al.Jan 29arXiv

RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'

#probing#concept vectors#activation steering