NanoKnow is a new benchmark that checks whether a language modelβs answers come from what it saw during training or from extra text we give it at question time.
The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.