Research · Dispatch #4 ·

One panel, five engines, mostly separate citation sets: 'cited by AI' is not one thing

We ran the same prompt for the same pages against five AI engines in one round. Of the eighteen pages any engine cited, ten were cited by a single engine and none by more than three. A citation on one engine barely tells you about the next. In this panel, each engine behaved like its own citation surface.

The fourth GEO Glossary dispatch. The earlier ones asked why a particular engine cited us. This one is about something underneath all of them: the engines mostly do not cite the same pages as each other.

The pattern

In a single weekly round we ran the same frozen prompt template, one fixed question per term, against five AI engines (ChatGPT, Claude, Gemini, Copilot, Perplexity) over the same panel of forty-four of our glossary pages, and logged which pages each engine named as a source. Eighteen of those pages were cited by at least one engine. The question we had not asked before is how many engines cited each.

Cited by Our pages
exactly one engine 10
exactly two engines 5
three engines 3
four or five engines 0

More than half of everything that got cited was cited by a single engine. Nothing was cited by four. The three pages three engines agreed on were citation precision and recall, external-traffic-disambiguation, and sub-passage extraction, and we will come back to why those.

The pairwise view fills in the shape. Most engine pairs shared little: ChatGPT and Copilot shared nothing at all, and most other pairs shared one or two pages. The exception is ChatGPT and Perplexity, which shared four. So the sets are mostly separate, but not uniformly: one pair of engines overlaps clearly, and everywhere else agreement is thin. Asked "is this page cited by AI," the honest answer for most pages in this round was "by one engine, and not the others."

It is not that one engine's list contains another's

The easy explanation for thin overlap is volume: maybe the engines just differ in how much they cite, and a generous engine's list quietly contains a stingy one's. It does not. The lists cross, they do not nest. Copilot, the stingiest engine, cited attribution rate and sub-passage extraction; ChatGPT, which cited five times as many pages, cited neither. Claude cited citation velocity, and no other engine did. Gemini cited generative search index alone. Each engine has pages that are its and no one else's, including the engines that barely cite anything.

What the numbers do and do not prove

The discipline of these dispatches is to ask what else could produce a pattern before believing the dramatic version, so: thin overlap is, on its own, partly arithmetic. Each engine cites only a handful of pages out of the panel, and two small sets drawn from the same larger pool will overlap little no matter what is driving them. With ChatGPT naming ten of forty-four pages and Gemini five, a chance model expects them to share about one; they shared two. ChatGPT and Perplexity, at ten and eight pages, would be expected to share under two by chance, and shared four, which is above chance, not below it. Low overlap is not the same as negative correlation: with sparse citation sets, engines can agree rarely without actively disagreeing, and where two of them do agree, as ChatGPT and Perplexity do, they agree slightly more than random would predict, not less. So we cannot claim from this round that the engines pull in opposite directions. What we can claim is narrower and still the operational part: citation is sparse on every engine, agreement between most pairs is thin, and a citation on one engine is a weak signal about most of the others.

Why the shared pages are shared, and the rest are not

Where engines agreed, the reason fits what earlier dispatches established about each one; the overlap structure is consistent with those per-engine patterns rather than proof of them, since the thin overlap itself is already explained by sparse sets. In our probes, Gemini and Perplexity have looked more tied to web-index visibility and ranking than the others; the last dispatch found Gemini's apparent taste for our coinages was largely Google rank rather than deference to us. ChatGPT's cited set did not track Google rank the way those two appeared to, and it leaned toward emerging and definitional terms. Claude was more likely to reach for the primary source where a clear one existed, citing the original paper over our explainer. Copilot, silent across our four earlier rounds, cited its first two pages this round, which may mean the Bing-side discoverability gap we have flagged before is loosening, or may just be volatility. Five products, five observed citation patterns, run over one panel, carved out five different slices of it.

The three pages multiple engines agreed on are the ones that clear several of those bars at once: citation precision, external-traffic-disambiguation, and sub-passage extraction are all pages where we are both well ranked in Google and the clear definitional source for the term, so a rank-driven engine and a definition-driven one both land on them. The same logic explains the one strong pair: ChatGPT and Perplexity overlapped on the well-ranked coined-metric and academic pages, the ones that satisfy more than one rule at once, which is exactly where different engines stop disagreeing. Agreement, where it happens, is concentrated on pages that are good by more than one measure. Most pages are good by one measure, on one engine, or none.

One caution on the numbers

The same caveats we attach to every dispatch, because they bound the reading. The panel over-samples our own coined terms on purpose, so these are not neutral pages. This is one round; citation is volatile, and next round's five sets will not be identical to this one's, which is part of the finding rather than a flaw in it. And "cited" here is elicited under a standard prompt, not what an engine volunteers. Read the overlap structure as a direction, strong in this round, to retest, not as a fixed constant of the engines.

How we measure "cited"

Dimension This dispatch
Rounds 1 round (the 2026-06-16 panel)
Engines ChatGPT, Claude, Gemini, Copilot, Perplexity (all five ran the full panel)
Panel 44 glossary pages; over-samples our coined terms; not neutral
Mode Logged out, private window, web search on, no personalization
Prompt Fixed template, one question per term, asks the engine to cite its web sources
Counted as cited Inline citation or folded "more sources" entry
Unit A glossary page named at least once
Read as Elicited, one round, volatile, small-n: a direction to retest

The takeaway for a publisher

"AI citation" is not one number, and a dashboard that gives you one is averaging away the finding that matters. A page that leads on ChatGPT can be absent from Gemini. A page Gemini appears to cite partly because it ranks well in Google can be invisible to ChatGPT, whose citations did not track Google rank in this panel. If you track your AI visibility on a single engine, you are reading one of five weakly related signals; if you track it as one blended score, you are hiding that the engines reward different things and agree mostly on the few pages that are strong by several measures at once. Do not report one blended AI-citation score unless you also show the engine-level breakdown underneath it. The useful unit is the engine: pick the one whose pattern matches your territory and your rank, measure that one on its own, and treat the others as separate instruments reading separate weather.

More dispatches