What is the difference between citation hallucination and citation precision?

Citation hallucination means the cited source does not exist at all; the model invented the reference. Citation precision is about real sources: it measures whether a source that does exist and was retrieved actually supports the claim it is attached to. Fabrication is 'the source is not real'; low precision is 'the source is real but does not back the claim.'

Has citation hallucination gotten better with newer models?

It improved substantially from GPT-3.5 to GPT-4 in Walters and Wilder's ungrounded literature-review task (55% to 18% fabricated), but current rates vary by model, task, domain, and whether retrieval grounding is enabled. Grounded answering reduces inventing wholly non-existent sources, but does not remove related failures like broken or fabricated URLs, which a Tow Center study found in more than half of Gemini and Grok 3 responses. The risk of a wholly invented source concentrates in ungrounded chat answering from training memory.

Can a publisher stop AI engines from fabricating citations to their site?

No. Citation hallucination is an engine-side failure; a publisher cannot make a model stop inventing sources. What helps indirectly is being a real, easily-verifiable, well-structured source: clean cite-able pages are more likely to be usable as grounded sources, though whether you are cited still depends on retrieval, ranking, query fit, and product behavior.

Citation hallucination

What is citation hallucination?

Citation hallucination is when an AI system produces a citation (a reference, URL, case, paper, or quotation) attributed to a source that does not exist or cannot be verified as published. The cited work was never published, posted, or otherwise disseminated; the model invented it, often by recombining real fragments (a real author with a fake title, a real journal with a fabricated volume), which is part of why fabricated citations are hard to spot. In a study of 636 citations generated by ChatGPT across 84 short literature reviews, Walters and Wilder found that 55% of GPT-3.5 citations and 18% of GPT-4 citations were fabricated, where a citation counted as fabricated if it did "not exist as actual works that have been published, presented, posted, or otherwise publicly disseminated"¹.

It sits alongside two adjacent failure modes the glossary tracks separately. Hallucination grounding asks whether an answer is anchored in retrieved content at all. Citation precision asks whether a real, retrieved source actually supports the claim it is attached to. Citation hallucination is the third case: the cited source itself is not real.

Status in 2026

Improving with model generation but not solved, and the picture differs sharply between ungrounded and grounded answering.

In ungrounded generation (a chatbot answering from training memory with no web retrieval), inventing a wholly non-existent source is common. Walters and Wilder found 55% of GPT-3.5 and 18% of GPT-4 citations fabricated¹, so it falls with model capability but does not disappear. The best-known real-world consequence is Mata v. Avianca, in which a Southern District of New York court sanctioned two lawyers and their law firm $5,000 on June 22, 2023 for filing a brief built on fake cases that ChatGPT had fabricated, complete with invented quotations².

In grounded AI search (where the engine cites pages it actually fetched, as in Perplexity or ChatGPT search), fabrication of non-existent sources should fall, but a hybrid variant the simple taxonomy misses appears instead: the source is real, but the cited URL is fabricated or broken, or points to a syndicated copy rather than the original. A Tow Center study of eight grounded AI search tools found wrong citations in more than 60% of responses; more than half of Gemini and Grok 3 responses cited fabricated or broken URLs (Grok 3 alone sent 154 of 200 cited links to error pages), with frequent misattribution to syndicated copies on Yahoo or AOL even for licensed publishers³. So grounding tends to move the failure from "the source does not exist" toward "the link or attribution is wrong," not to eliminate citation-quality failures.

How to apply

For readers and anyone relying on AI answers:

Treat a cited source's existence as unverified until checked. Confirm the URL resolves and the work is real before relying on it. Retrieval-grounded answers with clickable links are usually safer than free-text references in an ungrounded chat reply, but the link itself can still be broken or point to the wrong copy, so check it.
In your own measurement, separate three checks in a citation probe protocol: does the cited source exist (fabrication check), does it support the claim (precision check), and is the answer grounded at all (grounding check).

For publishers:

You cannot prevent an engine from inventing sources; this is an engine-side failure. What you can do is be a strong grounded candidate: real, verifiable, well-structured pages are more useful for retrieval-grounded answering, though whether you are cited still depends on retrieval, ranking, query fit, and product behavior. Grounded citing is what tends to displace fabrication of non-existent sources, but it does not remove broken-link or misattribution failures.

How it relates to other concepts

It is the third member of the citation-quality stack with hallucination grounding (is the answer grounded?) and citation precision (does a real source support the claim?). The natural sequence of checks is: grounded at all, then is the cited source real, then does it support the claim.
Related to sycophancy vs cite-able fact: both are LLM failure modes that erode trust in AI answers, one by flattering the user and one by inventing evidence.

William H. Walters & Esther Isabelle Wilder, "Fabrication and errors in the bibliographic citations generated by ChatGPT," Scientific Reports 13:14045, 2023. nature.com/articles/s41598-023-41032-5. 636 citations across 84 papers; 55% of GPT-3.5 and 18% of GPT-4 citations fabricated (defined as citations that do not exist as actual works). ↩ ↩²
Mata v. Avianca, Inc., 678 F.Supp.3d 443 (S.D.N.Y. 2023), Opinion and Order of June 22, 2023 (Castel, J.); a $5,000 sanction, jointly on the two plaintiff's attorneys and their law firm, for submitting a brief with fake cases fabricated by ChatGPT. law.justia.com/cases/federal/district-courts/new-york/nysdce/1:2022cv01461/575368/54. ↩
Klaudia Jaźwińska & Aisvarya Chandrasekar, "AI Search Has a Citation Problem," Tow Center for Digital Journalism, Columbia Journalism Review, 2025-03-06. cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php. Tested eight grounded AI search tools; wrong citations in more than 60% of responses; more than half of Gemini and Grok 3 responses cited fabricated or broken URLs (Grok 3 sent 154 of 200 cited links to error pages); frequent misattribution to syndicated copies on Yahoo News or AOL even for licensed publishers. ↩

Citation hallucination

Citation status

What is citation hallucination?

Status in 2026

How to apply

How it relates to other concepts

Part of AI behavior· editorial cluster, not a semantic link

FAQ

Sources & further reading

Citation status

What is citation hallucination?

Status in 2026

How to apply

How it relates to other concepts

Footnotes

Part of AI behavior· editorial cluster, not a semantic link

Related terms

FAQ

Sources & further reading

Get the monthly digest