All terms
The 2026 vocabulary of Generative Engine Optimization, with live per-term citation status across ChatGPT, Perplexity, Claude, and Copilot.
No terms match that search. Try a different keyword, or clear the search to browse all terms.
GEO content methods (2)
advanced
Black-hat C-SEO
Black-hat C-SEO is the use of adversarial techniques (most notably prompt injection hidden in page content) to manipulate an AI engine's ranking or citation behavior through deception rather than genuine content quality. It is the adversarial counterpart to white-hat C-SEO, which improves a page's actual clarity and usefulness. Beyond likely violating many platform terms, black-hat C-SEO is detectable, unreliable as models and defenses evolve, and a poor bet given that even the white-hat methods tested in C-SEO Bench show limited measured effect.
GPT0×Plx0×Cld0×Cop0×Gem·advanced
C-SEO Bench
C-SEO Bench is the Puerto et al. 2025 NeurIPS Datasets & Benchmarks paper that evaluates 9 Conversational Search Engine Optimization methods across 6 domains, two tasks (question answering + product recommendation), and continuous multi-actor adoption rates. Its headline finding is that most current C-SEO methods are largely ineffective once tested outside the single-actor synthetic conditions of prior GEO benchmarks; a traditional retrieval-ranking SEO baseline (moving the source to context position 1) is roughly 7.6× more effective in their retail-domain measurement than the best C-SEO method tested.
GPT0×Plx0×Cld0×Cop0×Gem·
Retrieval pipeline (6)
advanced
BM25
BM25 (Best Matching 25) is a probabilistic ranking function used by classical search engines and the lexical layer of modern hybrid retrieval systems. It is the standard mechanism for scoring exact-keyword match in search retrieval; its application inside specific commercial AI search engines is not vendor-documented but is consistent with observable lexical-signal behavior.
GPT·Plx0×CldCopGemadvanced
Context assembly
Context assembly is the stage between retrieval and generation where an AI system selects, orders, and packs retrieved passages into the model's context window before it generates an answer. In the retrieve-then-generate pipeline (RAG, Lewis et al. 2020), retrieval finds candidate passages; assembly decides which of them actually enter the prompt, in what order, within the token budget. It is where position effects like lost-in-the-middle (Liu et al. 2023) bite, so it, not retrieval alone, decides whether your passage is used.
GPTPlxCldCopGemadvanced
Hybrid retrieval
Hybrid retrieval combines lexical (keyword) search with semantic (vector embedding) search in a single ranking pipeline. It is a common production pattern in modern retrieval and RAG systems and is likely used in some form by many AI-search products, though individual vendors rarely disclose their full retrieval architecture.
GPTPlx·CldCopGemadvanced
Inverted index
An inverted index is the data structure classical search engines use to look up which documents contain a given term. It is the foundation under BM25 ranking and a common lexical layer in many modern hybrid retrieval systems.
GPT·PlxCldCopGemadvanced
Reranking
Reranking is the second-stage retrieval step where a more expensive model rescores the top-N candidates returned by an initial retriever. It is the technique that makes hybrid retrieval feasible at scale by trading recall breadth at stage one for precision at stage two.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Retrievability
Retrievability is an information-retrieval measure (Azzopardi & Vinay 2008) of how easily a document can be retrieved across a whole population of queries: the more queries that return it, and the higher its rank, the more retrievable it is. In AI search it names the upstream lever that content optimization skips, whether the engine's retrieval step can find and pull your page into the answer at all, which the GEO evidence suggests may be a more durable lever than isolated in-page rewrites.
GPTPlxCldCopGem
Infrastructure (2)
advanced
AIPREF (AI usage preferences)
AIPREF is the IETF AI Preferences working group's effort to standardize a machine-readable way for content owners to express how their content may be used by AI systems. The preference is carried by a Content-Usage signal, attached as an HTTP response header or a robots.txt rule, using a small vocabulary (currently the categories train-ai and search, each set to y or n). AIPREF declares a usage preference; it does not authenticate the requester (out of scope) and does not enforce compliance.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Web Bot Auth
Web Bot Auth is an emerging IETF-track standard for cryptographically verifying that an HTTP request comes from a declared bot operator. It applies RFC 9421 HTTP Message Signatures (Backman, Richer, Sporny; Proposed Standard, February 2024) to crawler traffic: each request is signed with the bot's Ed25519 private key, and the verifier fetches the matching public key from a well-known directory at /.well-known/http-message-signatures-directory. The standard addresses the crawler-controllability gap that user-agent strings and reverse DNS leave open, particularly for AI search crawlers that spoof browser UAs or rotate through residential and datacenter IPs.
GPT0×Plx0×Cld0×Cop0×Gem0×
Methodology (1)
AI behavior (3)
advanced
Citation precision and recall
Citation precision is the fraction of citations in an AI engine's response that actually support the sentence they are attached to. Citation recall is the fraction of generated sentences that are fully supported by their citations. Both are model-behavior metrics, not publisher-visibility metrics: they measure how faithfully an AI engine uses the sources it cites, not how often a publisher's content appears as a source.
GPT·Plx·Cld0×Cop0×Gem·advanced
Context rot
Context rot is the empirically observed degradation in an LLM's output quality as its input context grows longer, even on simple tasks and well below the model's maximum context window. Formalized by Chroma's 2025 study across 18 models, it is distinct from context-window overflow (hitting the hard token limit) and broader than lost in the middle (which is specifically positional): context rot is degradation along the length axis. For publishers it reinforces that a retrieval pipeline keeps only a bounded, high-signal context, so concise self-contained passages survive better than verbose padding.
GPT0×Plx0×Cld0×Cop0×Gem0×advanced
Lost in the Middle
Lost in the middle is the empirically observed tendency of large language models to use information at the beginning and end of a long input context more reliably than information positioned in the middle, producing a U-shaped accuracy curve. Documented by Liu et al. 'Lost in the Middle' (2023), it is one evidence-backed mechanism that helps explain many 'front-load your key content' recommendations, and is distinct from retrieval-side position weighting (which does not exist in BM25 or embedding ranking).
GPT0×Plx0×Cld0×Cop0×Gem0×