Papers2

#LLM factuality

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Nitay Calderon, Eyal Ben-David et al.Feb 15arXiv

Not all wrong answers from large language models (LLMs) mean they never learned the fact—many times the model knows it but can’t pull it out on demand.

#LLM factuality#encoding vs recall#knowledge profiling

Not triaged yet

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Beginner

Aileen Cheng, Alon Jacovi et al.Dec 11arXiv

The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.

#LLM factuality#benchmarking#multimodal evaluation

Not triaged yet