All terms
The 2026 vocabulary of Generative Engine Optimization, with live per-term citation status across ChatGPT, Perplexity, Claude, and Copilot.
No terms match that search. Try a different keyword, or clear the search to browse all terms.
Citation surfaces (12)
intermediate
AI dev tool citations
AI dev tool citations are the source attributions surfaced by AI-assisted developer environments (Cursor, Windsurf, Claude Code, Replit Agent, Bolt, Lovable, GitHub Copilot Chat) when their AI assistants use web search to ground responses to developer questions. A 2024-2025 emerging surface category. Citation tracking is harder than for general AI search because desktop dev tools open citation links in the system browser without a Referer header.
GPT0×Plx0×CldCopGemintermediate
AI Mode
AI Mode is Google's conversational AI search surface, accessed via a dedicated tab in Search results. It launched in Search Labs in March 2025, went to US public release in May 2025, and expanded to roughly 180 countries by August 2025. Powered by Gemini-family models with Google Search grounding.
GPT0×Plx0×CldCopGemintermediate
AI Overview citation
An AI Overview citation is the discrete event of a webpage being included in the source-link panel beneath Google's AI Overview answer in standard Google Search SERPs. Distinct from AI Mode citations (Google Search's separate conversational tab) and from Gemini citations (Google's standalone Gemini chatbot), AI Overview citation is the SERP-level event tracked when AI-search programs measure Google citation share. Google has explicitly stated that no special markup or optimization is required to appear; standard Google Search indexability is the prerequisite.
GPT·Plx0×Cld0×Cop0×Gem0×intermediate
Brave Search AI citation
Brave Search AI citation is the discrete event of a webpage being included as a cited source in one of Brave Search's AI features: AI Answers (concise summary with cited sources), Ask Brave (longer answers with chat and Deep Research), Featured Snippets (extractive snippet that predates generative AI), or AI-powered descriptions. Distinct from Bing-grounded AI surfaces (Microsoft Copilot for web sources) and Google-derived AI surfaces (AI Overview, AI Mode, Gemini): Brave operates a fully independent search index, so AI citation on Brave depends on Brave's own crawl and indexing decisions, not Bing's or Google's.
GPT·Plx0×CldCopGemintermediate
ChatGPT search citation
ChatGPT search citations are the source attributions OpenAI's ChatGPT produces when its web search tool returns real-time web content for grounding. Citations appear inline in the ChatGPT consumer surfaces (chatgpt.com web, Desktop apps for Mac and Windows, mobile apps for iOS and Android), in the standalone ChatGPT Atlas browser (macOS, launched October 2025), and as structured response fields in the OpenAI API. Distinct measurement target from Perplexity, Claude, Microsoft Copilot, AI Overview, and Gemini citations.
GPT·Plx0×CldCopGemintermediate
Claude citation
Claude citations are the source attributions Anthropic's Claude produces when its web search tool returns real-time web content for grounding. Citations appear inline as source chips in consumer surfaces (claude.ai web app, mobile, Claude Desktop) and as structured web_search_result_location fields in the Anthropic API response. Distinct measurement target from Perplexity, Microsoft Copilot, AI Overview, and AI Mode.
GPTPlx0×CldCopGemintermediate
DuckDuckGo AI citation
DuckDuckGo AI citation is the discrete event of a webpage being included as a linked source in DuckDuckGo's AI surfaces. Two surfaces matter: Search Assist (the AI-generated inline answer above DuckDuckGo search results, formerly DuckAssist, which always links to one or two sources beneath the summary) and Duck.ai (the privacy-anonymized chat interface to third-party models, where citation behavior depends on the underlying model). DuckDuckGo runs its own crawler, DuckAssistBot, for Search Assist.
GPT·Plx·Cld0×Cop0×Gem0×intermediate
Gemini citation
Gemini citations are the source attributions Google's Gemini chatbot and API produce when answering questions with grounded real-time web content. Citations appear inline in the Gemini chatbot app (gemini.google.com and mobile) and as structured groundingMetadata fields (with groundingChunks containing uri and title) in the Gemini API response. Distinct from AI Overview and AI Mode, which are Google Search surfaces, not Gemini app surfaces.
GPTPlxCldCopGem0×intermediate
Grok citation
Grok citation is the discrete event of a webpage being included as a cited source in xAI's Grok answer surfaces: WebSearch (index-based retrieval inside chat), DeepSearch (multi-step research with web + X integration and a visible reasoning trace), and the xAI API web_search tool (citations returned as structured response fields). Distinct from other AI citation surfaces because Grok pairs a general web index with native X (Twitter) data access, and because xAI's public crawler discipline is unusually opaque.
GPT·Plx0×Cld0×Cop0×Gem0×intermediate
Meta AI citation
Meta AI citation is the discrete event of being included as a source in Meta AI's answers across its surfaces (WhatsApp, Instagram, Messenger, Facebook, meta.ai web, the 2025 standalone Meta AI app, Ray-Ban Meta smart glasses, Quest VR). Since December 2025 Meta AI citation operates in two tiers: licensed publisher partners (CNN, Fox News, Fox Sports, USA Today, People Inc, Daily Caller, Washington Examiner, Le Monde Group) receive linked citations with referral traffic, while non-licensed publishers may still be summarized without inline attribution. Meta AI is the only citation-surfaces cluster anchor where citation behavior depends on a commercial licensing tier.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Microsoft Copilot citations
Microsoft Copilot citations are the source attributions Microsoft's Copilot AI assistant produces across its product family. The Copilot brand spans two distinct grounding paths: public-web Copilot surfaces (Copilot.com web, mobile apps, Edge Copilot, Windows Copilot) ground in the Bing web index and are publisher-addressable via Bing SEO and IndexNow; enterprise Microsoft 365 Copilot grounds primarily in Microsoft Graph tenant data and is largely outside publisher scope. M365 Copilot Chat (formerly Bing Chat Enterprise) sits between the two, using public web plus optional org data.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Perplexity citation
Perplexity citations are the numbered source attributions Perplexity displays in its AI-search answers across its web app, mobile apps, Comet browser, and Sonar API. Citations appear inline as bracketed numbers ([1], [2], [3]) linked to a source panel that lists each cited URL. Perplexity citations are a distinct measurement target from ChatGPT, Claude, Google AI Overview, and Microsoft Copilot citations: each engine uses different retrieval, weights signals differently, and tends to cite slightly different source pools.
GPT·Plx0×Cld0×Cop0×Gem0×
GEO content methods (10)
Cluster pillar
GEO content methods
GEO content methods are the content rewrites tested in Aggarwal 2023; most are weak or null levers, not the 30-40% the headlines claim.
GPTPlxCldCopGemintermediate
Authoritative Statement Strength
Authoritative statement strength is widely recommended in SEO content as a citation lever. Aggarwal et al. 2023's GEO paper tested 'Authoritative' tone as one of nine content-modification methods and reported verbatim 'to the contrary we find no significant improvement', a null finding rather than a modest lift. The +11.8% relative gain in raw PAWC numbers (21.8 vs baseline 19.5) was not framed by the paper as statistically meaningful. The folk wisdom that authoritative tone is a primary AI-citation lever has no empirical support in the only public benchmark; it is paper-verbatim null.
GPT·Plx·Cld·Cop0×Gem·advanced
Black-hat C-SEO
Black-hat C-SEO is the use of adversarial techniques (most notably prompt injection hidden in page content) to manipulate an AI engine's ranking or citation behavior through deception rather than genuine content quality. It is the adversarial counterpart to white-hat C-SEO, which improves a page's actual clarity and usefulness. Beyond likely violating many platform terms, black-hat C-SEO is detectable, unreliable as models and defenses evolve, and a poor bet given that even the white-hat methods tested in C-SEO Bench show limited measured effect.
GPT0×Plx0×Cld0×Cop0×Gem·advanced
C-SEO Bench
C-SEO Bench is the Puerto et al. 2025 NeurIPS Datasets & Benchmarks paper that evaluates 9 Conversational Search Engine Optimization methods across 6 domains, two tasks (question answering + product recommendation), and continuous multi-actor adoption rates. Its headline finding is that most current C-SEO methods are largely ineffective once tested outside the single-actor synthetic conditions of prior GEO benchmarks; a traditional retrieval-ranking SEO baseline (moving the source to context position 1) is roughly 7.6× more effective in their retail-domain measurement than the best C-SEO method tested.
GPT0×Plx0×Cld0×Cop0×Gem·intermediate
Cite Sources Optimization
Cite Sources Optimization is one of the four top-performing source-content modification methods in Aggarwal et al. 2023's GEO paper. The method actively rewrites content to add inline source citations for claims made, scoring PAWC 24.9 vs baseline 19.5 (~28% relative gain). The practitioner discipline framing extends the paper's one-shot intervention into a habitual writing technique.
GPTPlx·CldCopGemintermediate
Definition-Lead Style
Definition-lead style is the writer discipline of opening an answer block (the first paragraph of a term entry, a FAQ answer, or a content section) with a complete, self-contained definition before any elaboration. The discipline pairs with the extractive QA tradition (Rajpurkar et al. 2016 SQuAD) and this glossary's own answer-block convention (itself a glossary-coined practitioner concept). When a human reader scans the opening paragraph or an automated system extracts or summarizes it, the standalone definition is what gets surfaced.
GPT·PlxCldCopGemintermediate
Fluency Optimization
Fluency Optimization is one of the four top-performing source-content modification methods in Aggarwal et al. 2023's GEO paper. The method actively rewrites content for better readability, clarity, and flow, scoring PAWC 25.1 vs baseline 19.5 (~29% relative gain). The paper also found that combining Fluency Optimization with Statistics Addition outperforms any single GEO method by more than 5.5%, the strongest of the pairwise combinations measured in its top-4 combination experiment.
GPTPlxCld·CopGemintermediate
Keyword Stuffing
Keyword Stuffing is the Aggarwal et al. 2023 GEO paper's flagship negative result: the paper tested rewriting source content to include more query-relevant keywords (the traditional SEO tactic) and characterized the result verbatim as 'little to no performance improvement on Generative Engine's responses' in Section 4. The Table 1 main GEO-bench raw PAWC measurement (17.8 vs baseline 19.5, mathematically -8.7%) is consistent with the null prose; the Table 5 Perplexity.ai prose escalates further, characterizing Keyword Stuffing as performing 10% worse than the Perplexity baseline. This entry documents the paper finding and the 2025 C-SEO Bench follow-up that confirms the null/negative result under multi-actor production-realistic conditions.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Quotation Addition
Quotation Addition is the Aggarwal et al. 2023 GEO paper's top-performing source-content modification method (PAWC 27.8 vs baseline 19.5, ~43% relative gain): actively rewriting content to include sourced direct quotations from authorities. The practitioner discipline framing extends the paper's one-shot intervention into a habitual writing technique.
GPTPlxCld·CopGemintermediate
Statistical Density
Statistical density is a practitioner-coined shorthand for the content property that the Aggarwal et al. 2023 GEO paper's 'Statistics Addition' method tries to increase: presence of verifiable statistics, dates, and numerical claims. The term itself and any specific ratio definition are practitioner-derived, not paper measurements.
GPTPlxCld·CopGem
Schema cluster (6)
foundational
Article Schema
Article schema (schema.org/Article) is the JSON-LD type for marking up news articles, blog posts, and long-form editorial content. It exposes authorship, publication date, and topical-scope metadata that AI engines and search engines can parse, though no major engine has publicly confirmed how heavily these fields weight in citation or ranking decisions.
GPTPlx0×CldCopGemfoundational
BreadcrumbList Schema
BreadcrumbList (schema.org/BreadcrumbList) is the JSON-LD type that declares a page's position in the site hierarchy. It powers Google's breadcrumb rich result in the SERP and can provide machine-readable site-hierarchy context where systems parse structured data.
GPTPlx0×CldCopGemintermediate
DefinedTerm schema
DefinedTerm is a schema.org type representing a term defined elsewhere (typically in a glossary), helping systems that parse structured data understand a term, its definition, and its relationship to a glossary source.
GPT0×Plx·Cld0×Cop0×Gem0×intermediate
FAQ Schema
FAQ Schema (FAQPage) is the schema.org JSON-LD type for marking up question-and-answer pairs on a page, commonly used to help search engines and other systems parse visible Q&A content in a machine-readable format. Google fully deprecated FAQ rich results on May 7, 2026; the markup remains valid for machine-readability purposes.
GPT·PlxCldCopGemintermediate
HowTo Schema
HowTo schema (schema.org/HowTo) is the JSON-LD type for marking up step-by-step instructional content: long-form how-to guides, tutorials, and procedural articles.
GPT0×Plx0×Cld0×Cop0×Gem0×foundational
JSON-LD
JSON-LD (JavaScript Object Notation for Linked Data) is the W3C-standard syntax for embedding schema.org structured data into web pages. It is the most commonly recommended format for publishing structured data and is widely parsed by AI search engines and Google Search for entity, definition, and FAQ markup.
GPT0×Plx0×Cld0×Cop0×Gem0×
Retrieval pipeline (16)
intermediate
Agentic retrieval
Agentic retrieval is a search pattern where an AI agent autonomously decides what to query, when to query again, and which sources to consult. It replaces single-shot keyword retrieval with iterative, goal-directed information gathering.
GPT·Plx0×Cld0×Cop0×Gem0×advanced
BM25
BM25 (Best Matching 25) is a probabilistic ranking function used by classical search engines and the lexical layer of modern hybrid retrieval systems. It is the standard mechanism for scoring exact-keyword match in search retrieval; its application inside specific commercial AI search engines is not vendor-documented but is consistent with observable lexical-signal behavior.
GPT·Plx0×CldCopGemintermediate
Chunking
Chunking is the preprocessing step in a retrieval pipeline that splits a document into smaller segments (chunks, typically no more than a few hundred tokens each) so each can be embedded, indexed, and retrieved independently. Because a retrieval system typically scores and passes chunks rather than whole pages, a publisher does not control where their content is split and should instead write self-contained passages that survive any chunk boundary.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Context assembly
Context assembly is the stage between retrieval and generation where an AI system selects, orders, and packs retrieved passages into the model's context window before it generates an answer. In the retrieve-then-generate pipeline (RAG, Lewis et al. 2020), retrieval finds candidate passages; assembly decides which of them actually enter the prompt, in what order, within the token budget. It is where position effects like lost-in-the-middle (Liu et al. 2023) bite, so it, not retrieval alone, decides whether your passage is used.
GPTPlxCldCopGemintermediate
Deep research mode
Deep research mode is the agentic, multi-step feature in AI search and assistant products that autonomously runs dozens to hundreds of web searches, reasons across the results, and produces a long, fully-cited report instead of a quick answer. A first wave launched between December 2024 and February 2025, with Microsoft and Anthropic following in spring 2025.
GPTPlxCldCopGemintermediate
Generative search index
Generative search index is a glossary-coined practitioner shorthand for the retrieval-corpus backend that AI search engines query when fetching passages for generation. Standard industry terms for the underlying systems include vector database, RAG backend, hybrid search system, and (at the component layer) search index.
GPT0×Plx0×Cld0×Cop0×Gem·advanced
Hybrid retrieval
Hybrid retrieval combines lexical (keyword) search with semantic (vector embedding) search in a single ranking pipeline. It is a common production pattern in modern retrieval and RAG systems and is likely used in some form by many AI-search products, though individual vendors rarely disclose their full retrieval architecture.
GPTPlx·CldCopGemadvanced
Inverted index
An inverted index is the data structure classical search engines use to look up which documents contain a given term. It is the foundation under BM25 ranking and a common lexical layer in many modern hybrid retrieval systems.
GPT·PlxCldCopGemintermediate
Passage-level optimization
Passage-level optimization is the practice of structuring individual paragraphs and sections so they remain understandable when extracted independently. It is a content-design discipline derived from how retrieval-augmented and AI-search systems often operate over passages or chunks rather than whole documents.
GPTPlx·CldCopGemintermediate
Query fan-out
Query fan-out is the technique where an AI search engine breaks a single query into multiple sub-queries, runs them in parallel, and synthesizes the results into one cited answer. Google documents it for AI Mode and Deep Search; practitioners also invoke it to explain why AI engines cite pages outside the top 10 organic results.
GPTPlxCldCopGemintermediate
RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) is the architectural pattern that combines a language model's generation capabilities with a retrieval system that fetches relevant external content at query time, grounding responses in that retrieved content. Whether the retrieved sources are exposed as visible citations is a product-layer design choice rather than an inherent property of RAG.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Reranking
Reranking is the second-stage retrieval step where a more expensive model rescores the top-N candidates returned by an initial retriever. It is the technique that makes hybrid retrieval feasible at scale by trading recall breadth at stage one for precision at stage two.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Retrievability
Retrievability is an information-retrieval measure (Azzopardi & Vinay 2008) of how easily a document can be retrieved across a whole population of queries: the more queries that return it, and the higher its rank, the more retrievable it is. In AI search it names the upstream lever that content optimization skips, whether the engine's retrieval step can find and pull your page into the answer at all, which the GEO evidence suggests may be a more durable lever than isolated in-page rewrites.
GPTPlxCldCopGemintermediate
Sub-document retrieval
Sub-document retrieval is the practice of indexing and retrieving passages or paragraphs rather than whole documents. It is a common retrieval pattern in RAG and AI-search systems, especially when long documents need to be matched against specific user queries.
GPTPlx·CldCopGemintermediate
Sub-passage extraction
Sub-passage extraction is a practitioner shorthand for the content-level phenomenon of answer systems quoting a single sentence- or claim-level fragment from a retrieved passage. In classical IR the same operation is called extractive QA or span selection; this entry uses 'sub-passage extraction' to align with the 'sub-document retrieval' framing and to cover both classical and LLM-era behavior under one term.
GPT·Plx0×Cld0×Cop0×Gem·intermediate
Vector embeddings
Vector embeddings are numerical representations of text passages used by AI search engines to find semantically similar content. They are the underlying mechanism behind meaning-based retrieval rather than keyword matching.
GPT0×Plx0×Cld0×Cop0×Gem0×
Citation metrics (11)
Cluster pillar
AI citation metrics
The six AI citation metrics GEO programs track: attribution rate, citation share, citation match rate, cite-ability, citation velocity, and citation rotation.
GPTPlx0×CldCopGemintermediate
AI visibility
AI visibility is the practitioner and vendor umbrella term for how often and how prominently a brand, site, or page appears across AI-generated answers. It is not a single metric: it bundles brand mentions, citation share, and attribution rate, which this glossary tracks separately.
GPTPlxCldCopGemintermediate
Attribution rate
Attribution rate (in AI search / GEO) is the percentage of evaluated AI-engine responses that cite a specific source or domain for a defined prompt set. One of the most commonly used proxies for GEO success; distinct from traditional marketing attribution, which credits conversions across touchpoints.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Brand mentions in AI answers
Brand mentions in AI answers are instances where an AI engine names a brand, product, or company in its generated response. They are distinct from linked citations because the mention may not include a clickable source URL.
GPTPlx0×CldCopGemintermediate
Citation Footprint
Citation footprint is a glossary-coined metric for the cumulative breadth of a site's AI-cited content: the distinct pages that AI search engines have cited at least once, tracked over time and across engines. It isolates citation coverage (how much of your library has ever been cited) from intensity at a point in time (citation share).
GPTPlxCldCopGemintermediate
Citation match rate
Citation match rate is the percentage of AI-engine references to a source that include a clickable link back to that source. Computed as (linked citations) ÷ (all attributed references) × 100, it isolates the link-bearing subset of attribution from unlinked mentions in the same response stream.
GPT·Plx0×Cld·Cop0×Gem·intermediate
Citation rotation
Citation rotation is the rate at which the sources an AI search engine cites for a given query change over time. In high-rotation measurement windows the cited-source set may change weekly or faster; in low-rotation windows the same top sources persist for months. Practitioners measure rotation as a separate dimension from citation share (relative presence) and citation velocity (rate of new citations). Discussed across the literature under multiple names: citation volatility, source pool cycling, source rotation, and (as the inverse) citation persistence. The underlying mechanism (retrieval, ranking, grounding, or UI selection) is not vendor-documented at the per-query level.
GPTPlx0×CldCopGemintermediate
Citation share
Citation share is the relative percentage of citations a source receives versus competitors across AI-engine responses on a given topic. It is the AI-search analog (not direct equivalent) of traditional share of voice, measuring relative presence rather than absolute volume.
GPTPlx0×Cld·CopGemintermediate
Citation velocity
Citation velocity is the rate at which new AI-engine citations to a source accumulate over time. Where attribution rate is a point-in-time ratio, citation velocity is a temporal leading indicator: new citations per fixed window, by engine, for a given query set. The two have different units (a count per time vs a ratio) and are not strict mathematical derivatives of each other; the practical relationship is that velocity often moves before attribution rate does.
GPTPlx0×CldCopGemintermediate
Citation vs mention vs link
Citation, mention, and link are three distinct ways an AI answer can surface a source, brand, entity, or URL. Citation is a structured reference indicating the engine presents that source as supporting part of the answer. Mention is naming a brand, source, entity, or product inline without necessarily linking. Link is the URL artifact itself. The three are independent dimensions, not a containment relationship: a mention can be linked or unlinked; a citation typically includes a link but the act of citing is separate from the linkability.
GPTPlx0×CldCopGemintermediate
Cite-ability
Cite-ability is a practitioner-coined content property describing how suitable a passage is for AI extraction, quotation, and attribution. It is informed by factors like structural clarity, self-contained phrasing, and attribution clarity, but it is not a formal industry metric and is not defined in any major academic paper.
GPT·Plx·Cld·Cop0×Gem0×
Search foundations (11)
foundational
AI Overview
AI Overview is Google's generative search feature that produces AI-synthesized answers at the top of results, citing multiple sources. It is the 2024 rename and rollout of Search Generative Experience (SGE).
GPT·Plx0×Cld0×Cop0×Gem0×intermediate
Answer block
An answer block is a single self-contained content unit (typically one paragraph or a brief structured list) designed to be easy for search and AI-answer systems to extract, summarize, or cite as a response to a specific user query.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Authority signals
Authority signals are observable indicators practitioners use to assess whether a source is likely to be trusted, retrieved, cited, or ranked by search and AI-search systems. They encompass backlinks, entity recognition, author transparency, freshness, structured data, and brand mentions across channels. No engine has published its per-signal weighting; the practitioner discussion has shifted meaningfully in the 2026 AI-search era.
GPTPlx0×CldCopGemintermediate
E-E-A-T (AI search context)
E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) is Google's content-quality framework for human Search Quality Raters. Google explicitly states it is not a direct ranking factor in classic Search; whether AI engines use it as a citation signal is not vendor-documented. Practitioners use it as a lens for source-trust optimization.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Entity-based SEO
Entity-based SEO is the practice of optimizing for the entities (people, places, concepts, products) that search engines and AI systems may recognize or associate with your content, rather than only for keyword strings. It is the post-2020 evolution of classic keyword SEO toward semantic search, complementary to keyword work rather than replacing it.
GPT·PlxCldCopGemfoundational
Featured snippets
Featured snippets are the boxed answers Google displays at the top of search results for question-form queries. They are the SERP feature AEO grew up around, increasingly subsumed by AI Overview in 2024–2026.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Freshness signals
Freshness signals are the metadata inputs AI engines and search engines use to assess how recent a piece of content is: datePublished, dateModified, version history, and on-page recency markers. Empirical data shows most AI assistants (ChatGPT, Perplexity, Gemini, Copilot) prefer fresher content than Google's organic SERP, with the notable exception of Google's AI Overview, which actually cites slightly older content on average.
GPTPlx·CldCopGemintermediate
Knowledge Graph
A Knowledge Graph is a structured network of entities (people, places, concepts, products) and their relationships. It is one important mechanism (alongside web search, embeddings, retrieval systems, and model knowledge) that AI and search systems may use to resolve entities, disambiguate queries, and organize facts.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Pillar content
Pillar content is a long-form, comprehensive page that anchors a topic cluster, surrounded by spoke pages that drill into specific sub-topics and link back to the pillar. In the AI-search era, pillar content's role has expanded from PageRank concentration toward topic coverage, entity clarity, internal discoverability, and retrieval-friendly organization.
GPT·Plx0×Cld0×Cop0×Gem0×foundational
Search Generative Experience (SGE)
Search Generative Experience (SGE) was Google's labs-stage AI search feature launched in May 2023, the precursor to AI Overview. Renamed and graduated to AI Overview at Google I/O in May 2024.
GPT0×Plx0×Cld0×Cop0×Gem0×foundational
Topic clusters
Topic clusters are a content architecture pattern: one pillar page covers a topic broadly, multiple spoke pages drill into sub-topics, all interlinked. Originated as classic SEO methodology; increasingly adapted for AI-search topic coverage, entity clarity, and retrieval-friendly content organization.
GPT·PlxCldCopGem
Infrastructure (6)
Cluster pillar
AI access control
AI access control is the set of signals a site uses to govern how AI systems fetch, identify against, and use it: robots.txt, llms.txt, AIPREF, Web Bot Auth.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
AI crawler bots
AI crawler bots are user agents operated by AI search engines (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot and others) that fetch web content for training, retrieval, and user-initiated browsing. Several engines also publish robots.txt control tokens (Google-Extended, Applebot-Extended) that do not crawl but signal opt-out of model training.
GPT·Plx0×Cld0×Cop0×Gem0×advanced
AIPREF (AI usage preferences)
AIPREF is the IETF AI Preferences working group's effort to standardize a machine-readable way for content owners to express how their content may be used by AI systems. The preference is carried by a Content-Usage signal, attached as an HTTP response header or a robots.txt rule, using a small vocabulary (currently the categories train-ai and search, each set to y or n). AIPREF declares a usage preference; it does not authenticate the requester (out of scope) and does not enforce compliance.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
IndexNow Protocol
IndexNow is an open protocol for instant URL notification to participating search engines. Site owners deploy a public key file and POST changed URLs to a single endpoint; the receiving engine relays the notification to every other participant. Adopted by Bing, Yandex, Naver, Seznam, and Yep; not adopted by Google.
GPT0×PlxCldCopGemintermediate
LLMS.txt
LLMS.txt is a proposed text file at the root of a website, written in Markdown, that provides a curated, AI-readable summary of the site's most important resources for large language models. It is a community-proposed companion-style file, not a replacement for robots.txt or sitemap.xml, which serve different purposes (crawl control, URL discovery) and are widely standardized in ways llms.txt is not.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Web Bot Auth
Web Bot Auth is an emerging IETF-track standard for cryptographically verifying that an HTTP request comes from a declared bot operator. It applies RFC 9421 HTTP Message Signatures (Backman, Richer, Sporny; Proposed Standard, February 2024) to crawler traffic: each request is signed with the bot's Ed25519 private key, and the verifier fetches the matching public key from a well-known directory at /.well-known/http-message-signatures-directory. The standard addresses the crawler-controllability gap that user-agent strings and reverse DNS leave open, particularly for AI search crawlers that spoof browser UAs or rotate through residential and datacenter IPs.
GPT0×Plx0×Cld0×Cop0×Gem0×
Methodology (3)
intermediate
Citation probe protocol
A citation probe protocol is the standardized operating procedure for measuring whether AI engines cite a publisher's content. It locks down query design, cadence, engine coverage, recording schema, disambiguation rules, and signal-vs-noise thresholds, turning ad-hoc 'ask ChatGPT and see' into a repeatable, comparable, vendor-neutral measurement program. Practitioner-coined methodology entry; the cluster's foundational SOP for the six citation-metrics anchors.
GPTPlx·CldCopGem·intermediate
External traffic disambiguation
External traffic disambiguation is a practitioner-coined methodology for distinguishing real external visitors to a website from the site owner's own browsing, headless-browser scrapers, AI training crawlers, and VPN edge artifacts. The framework uses five orthogonal axes (foreign edge / cache state / path pattern / UA-plus-referer / non-scraper UA pattern) read off server logs (such as Vercel Logs) and applied jointly. Used as a publisher-side method when traditional analytics tools cannot reliably separate AI-citation-driven traffic from bot noise.
GPT·PlxCldCopGemadvanced
Position-Adjusted Word Count
Position-Adjusted Word Count (PAWC) is the metric in Aggarwal et al. 2023's GEO paper that scores how much of an AI engine's answer is drawn from a given source, weighting earlier-positioned text more heavily. It is the number behind nearly every '+40% GEO visibility' claim, but it measures word-count share under single-actor 2023 conditions, not citation rate or ranking.
GPTPlxCldCopGem
AI behavior (7)
intermediate
Citation hallucination
Citation hallucination is when an AI system produces a citation to a source that does not exist. It is distinct from citing a real source inaccurately (a citation-precision failure) and from answering with no grounding at all (a hallucination-grounding failure).
GPTPlxCldCopGemadvanced
Citation precision and recall
Citation precision is the fraction of citations in an AI engine's response that actually support the sentence they are attached to. Citation recall is the fraction of generated sentences that are fully supported by their citations. Both are model-behavior metrics, not publisher-visibility metrics: they measure how faithfully an AI engine uses the sources it cites, not how often a publisher's content appears as a source.
GPT·Plx·Cld0×Cop0×Gem·advanced
Context rot
Context rot is the empirically observed degradation in an LLM's output quality as its input context grows longer, even on simple tasks and well below the model's maximum context window. Formalized by Chroma's 2025 study across 18 models, it is distinct from context-window overflow (hitting the hard token limit) and broader than lost in the middle (which is specifically positional): context rot is degradation along the length axis. For publishers it reinforces that a retrieval pipeline keeps only a bounded, high-signal context, so concise self-contained passages survive better than verbose padding.
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Hallucination grounding
Hallucination grounding is the practice of pinning AI-generated responses to retrieved source documents. It reduces the risk of plausible-sounding but unsupported claims by encouraging or enforcing source traceability for key claims, depending on the system. Grounding mitigates hallucination; it does not eliminate it entirely.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Lost in the Middle
Lost in the middle is the empirically observed tendency of large language models to use information at the beginning and end of a long input context more reliably than information positioned in the middle, producing a U-shaped accuracy curve. Documented by Liu et al. 'Lost in the Middle' (2023), it is one evidence-backed mechanism that helps explain many 'front-load your key content' recommendations, and is distinct from retrieval-side position weighting (which does not exist in BM25 or embedding ranking).
GPT0×Plx0×Cld0×Cop0×Gem0×intermediate
Prompt injection
Prompt injection is an attack class where adversarial text is placed where a language model reads it as a command rather than as data, overriding the model's intended task. Direct injection is typed into the prompt by the user; indirect prompt injection (Greshake et al. 2023) hides the instructions in content the model later retrieves (a page, a document, a search result), so the attacker never touches the model directly. The indirect variant is the security mirror of GEO: it rides the same retrieval path that optimization targets.
GPTPlxCldCopGemintermediate
Sycophancy vs cite-able fact
Sycophancy is the LLM failure mode of producing agreeable, hedge-laden, or context-flattering responses at the expense of factual specificity. Cite-able fact production is a separate content-writing pattern that emphasizes specific, attributed, falsifiable claims. The two are not strict opposites, but understanding both is useful for AI-search content writers.
GPT0×Plx·Cld0×Cop0×Gem0×
Umbrella terms (4)
foundational
AI Search Optimization
AI Search Optimization (AIO) is the umbrella term for optimizing content visibility across all AI-driven search surfaces: generative engines, answer engines, and emerging agentic browse experiences.
GPT0×Plx0×Cld0×Cop0×Gem0×foundational
Answer Engine Optimization
Answer Engine Optimization (AEO) is the practice of structuring content to be selected as the direct answer in AI-driven and traditional answer engines: featured snippets, voice assistants, and chat-based search.
GPTPlx0×CldCopGemfoundational
Generative Engine Optimization
Generative Engine Optimization (GEO) is the practice of structuring web content so that generative AI search engines like ChatGPT, Perplexity, Claude, and Google's AI Overview cite it as a primary source.
GPT0×Plx0×Cld0×Cop0×Gem0×foundational
LLM Optimization (LLMO)
LLM Optimization (LLMO) is the practice of making content easier for LLM-based systems to retrieve, parse, summarize, and attribute accurately when answering user prompts. It covers runtime retrieval influence (clearer, more actionable for publishers) and training-data influence (indirect, mostly unobservable, debated).
GPT0×Plx0×Cld0×Cop0×Gem0×