The paper tackles a new integrity problem in science: large language models sometimes invent realistic-looking citations that do not exist.
This paper builds a new test, called MURGAT, to check whether AI models can back up each small fact they say with the right part of a video, audio, or figure.
The paper finds almost 300 accepted NLP papers (mostly in 2025) that include at least one fake or non-existent reference, which the authors call a HalluCitation.