Papers2

#evidence alignment

CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era

Zhengqing Yuan, Kaiwen Shi et al.Feb 26arXiv

The paper tackles a new integrity problem in science: large language models sometimes invent realistic-looking citations that do not exist.

#citation verification#hallucinated citations#scholarly integrity

Not triaged yet

LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation

Beginner

Zhiling Yan, Dingjie Song et al.Feb 10arXiv

LiveMedBench is a new, always-updating test for medical AIs that keeps test questions safely separated from training data to avoid cheating by memorization.

#LiveMedBench#medical benchmark#data contamination

Not triaged yet