🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#factuality

Self-Improving Pretraining: using post-trained models to pretrain better models

Intermediate
Ellen Xiaoqing Tan, Shehzaad Dhuliawala et al.Jan 29arXiv

This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.

#self-improving pretraining#reinforcement learning#online DPO

Linear representations in language models can change dramatically over a conversation

Intermediate
Andrew Kyle Lampinen, Yuxuan Li et al.Jan 28arXiv

Language models store ideas along straight-line directions inside their brains (representations), like sliders for “truth” or “ethics.”

#linear representations#factuality#ethics

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

Intermediate
Zhongxiang Sun, Yi Zhan et al.Jan 16arXiv

Personalized AI helpers can accidentally copy a user’s past opinions instead of telling objective facts, which the authors call personalization-induced hallucinations.

#personalized large language models#hallucination#factuality

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Intermediate
Hongjun An, Yiliang Song et al.Jan 10arXiv

The paper shows that friendly, people-pleasing language can trick even advanced language models into agreeing with wrong answers.

#Preference-Undermining Attacks#PUA#sycophancy