Papers5

#factuality

DREAM: Deep Research Evaluation with Agentic Metrics

Elad Ben Avraham, Changhao Li et al.Feb 21arXiv

Deep research agents write long reports, but old tests often judge only how smooth they sound and whether they add links, not whether the facts are true today or the logic really holds.

#deep research agents#agentic evaluation#capability parity

Self-Improving Pretraining: using post-trained models to pretrain better models

Intermediate

Ellen Xiaoqing Tan, Shehzaad Dhuliawala et al.Jan 29arXiv

This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.

#self-improving pretraining#reinforcement learning#online DPO

Linear representations in language models can change dramatically over a conversation

Intermediate

Andrew Kyle Lampinen, Yuxuan Li et al.Jan 28arXiv

Language models store ideas along straight-line directions inside their brains (representations), like sliders for “truth” or “ethics.”

#linear representations#factuality#ethics

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

Intermediate

Zhongxiang Sun, Yi Zhan et al.Jan 16arXiv

Personalized AI helpers can accidentally copy a user’s past opinions instead of telling objective facts, which the authors call personalization-induced hallucinations.

#personalized large language models#hallucination#factuality

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Intermediate

Hongjun An, Yiliang Song et al.Jan 10arXiv

The paper shows that friendly, people-pleasing language can trick even advanced language models into agreeing with wrong answers.

#Preference-Undermining Attacks#PUA#sycophancy