Not all wrong answers from large language models (LLMs) mean they never learned the fact—many times the model knows it but can’t pull it out on demand.
The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.