Papers2

#attention masking

How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Decoder-only language models can be great at making user profiles (embeddings), but how we let them look at the sequence—called attention masking—changes how smart those profiles are.

#decoder-only LLM#attention masking#causal attention

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Intermediate

Hyunjong Ok, Jaeho LeeJan 20arXiv

Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.

#causal attention#prompt order sensitivity#multiple-choice question answering