/terms/topic-clusters · 5 min read · foundational
Topic clusters
Citation status
Last checked 2026-05-20
What are topic clusters?
A topic cluster is a content architecture pattern: a long-form pillar page broadly covers a topic, and a set of narrower spoke pages each drill into one sub-topic. The pillar links out to every spoke; every spoke links back to the pillar. The structure was popularized by HubSpot's content marketing framework around 20171 and became standard SEO practice through the 2020s.
The classical SEO rationale was PageRank concentration: many internal links flowing toward the pillar boosted its ranking authority. In the AI search era, the value proposition has shifted toward topical clarity, retrieval-friendly content organization, and entity coherence: a well-built cluster can make topical relationships clearer and may improve the likelihood that engines retrieve relevant pillar or spoke pages for queries about the topic. Practitioners commonly observe clustered content earning citations across both pillar and spokes more reliably than equivalent isolated pages, though the independent effect of cluster structure vs cluster-level content quality has not been isolated by public study, and citation behavior varies by engine, query, and source authority.
Status in 2026
Mainstream but interpreted differently than in the 2017-2019 SEO playbooks when HubSpot's framework first dominated practice. Spoke pages now need to be cite-able as standalone answers. Embedding-based retrieval and RAG systems typically chunk content at paragraph or sentence boundaries and score retrieved passages individually, so thin spokes that exist only as PageRank funnel may underperform in AI-search retrieval2; exact chunking and scoring behavior varies per engine and is not vendor-documented (see the sub-document retrieval entry for parallel discussion). The 2026 best practice combines the original hub-and-spoke structure with DefinedTermSet-style schema markup for glossary-style clusters; this can make the cluster's collective scope more explicit and machine-readable at the schema layer, though whether engines treat schema-marked clusters as categorically distinct from interlinked content with the same topical coverage has not been isolated by public study.
How to apply
Topic clusters are signal-stacking applied across a content family rather than a single page. Three concrete moves:
- Define the spoke set before writing the pillar: list 15–30 sub-topics that fall under the pillar's domain. Each becomes a candidate spoke. A pillar without a defined spoke set tends to become a sprawling page that ranks for nothing in particular.
- Make every spoke independently cite-able: each spoke should answer a real user query and stand alone in retrieval. Don't write spokes as funnel bait; write them as standalone answers that happen to live in a cluster. FAQPage JSON-LD remains a valid schema vocabulary for question-and-answer structure where the spoke has real Q&A content, but no longer earns SERP rich results (Google fully deprecated FAQ rich results for all sites on May 7, 2026; see the FAQ schema entry); ship it for the underlying Q&A structure's machine readability rather than for SERP visual treatment.
- Wire the cluster with
DefinedTermSetfor terminology hubs: if your cluster is a glossary or jargon set, mark the pillar asDefinedTermSetand link eachDefinedTermspoke viahasDefinedTerm. This can make the cluster's collective scope more explicit and machine-readable at the schema layer; it does not guarantee that engines will treat the cluster as a recognized entity collection. Recognition still depends on consistency, source trust, and the kind of content-level signals discussed on the entity-based SEO and knowledge graph entries.
What to skip: classical "PageRank funnel" thinking. The 2026 cluster goal is topical clarity and retrieval coverage, not link equity concentration. Spokes that exist only to feed the pillar tend to dilute rather than strengthen the cluster.
How to measure cluster effect on AI citation
Because the lift from cluster structure (vs cluster-level content quality) is not isolated by public study, the only reliable answer for any specific topic is to measure it directly. A practitioner protocol:
- Fix a cluster-level prompt set before build-out: pick 15-30 user-question prompts that cover the pillar topic plus its main sub-topics. Lock the set; rotation breaks comparability across the measurement window.
- Record citations per page-role separately: for each prompt, log whether the pillar was cited, whether any spoke was cited, whether a brand mention appeared without citation, and which engine surfaced what. Aggregating "the cluster was cited" hides whether the pillar is doing the work, the spokes are, or only one outlier spoke is.
- Compute cluster-level citation share and attribution rate: roll up to the topic-cluster level rather than treating each URL independently; this is what makes the cluster framing operationally distinct from individual-page tracking (see the citation share and attribution rate entries).
- Compare 4-8 weeks before vs after material build-out: ship the spoke set with measured Day-0 baseline, then re-probe weekly. Treat the trend line as the signal, not any single probe.
- For Microsoft Copilot specifically: Bing Webmaster Tools' AI Performance dashboard (public preview since 2026-02-10) surfaces per-page citation counts and grounding queries, the only vendor-native measurement source for any AI surface as of mid-2026.
This loop turns "cluster lifts citation" from a confident causal claim into a verifiable observation specific to your topic, audience, and engine mix.
How it relates to other concepts
- The hub-and-spoke pattern's hub is pillar content: pillar is the page, cluster is the structure.
- Plausible contributor to Knowledge Graph signals when the cluster is wired with consistent
Organization+DefinedTermSetschema; the independent effect of cluster structure on KG entity recognition vs equivalent un-clustered topical coverage has not been isolated. - The content-architecture layer of entity-based SEO: entity-based SEO is the "how" of marking entities; topic clusters are the "where" (the page-set scope across which the entity work compounds).
- For terminology clusters specifically, DefinedTerm schema +
DefinedTermSetform the schema-layer backbone. - Operates through sub-document retrieval at the engine side: clusters matter for AI search partly because retrieval scores passages individually, so a well-structured spoke is more retrievable than an equivalent paragraph buried in the pillar.
- Direct content-strategy enabler of GEO at scale. Single pages compete passage-by-passage; clusters give the topic a larger surface area at the passage level.
Footnotes
-
HubSpot's original framing of topic clusters and pillar pages, the framework that popularized the hub-and-spoke content pattern (2017). blog.hubspot.com/marketing/topic-clusters-seo. ↩
-
Aggarwal et al. "GEO: Generative Engine Optimization." arXiv:2311.09735, November 2023. Tests 9 LLM-prompted content-modification methods at source-page level against a Position-Adjusted Word Count (PAWC) visibility metric; top performers include Quotation Addition (PAWC 27.2 vs the no-modification baseline of 19.3, ~41% relative gain), Statistics Addition (~31%), Fluency Optimization (~28%), and Cite Sources (~27%); these per-method percentages are derived from the paper's position-adjusted PAWC scores (the "Overall" column; the un-adjusted Word sub-column reads 27.8 / 25.9 / 25.1 / 24.9) against the 19.3 baseline, while the paper's own Results section names a 30-40% gain for its top-3 (Cite Sources, Quotation Addition, Statistics Addition). The abstract's "up to 40%" is the rounded form of the best method's ~41% position-adjusted gain (Quotation Addition, 27.2 over baseline 19.3). The paper does not distinguish pillar vs spoke pages (those are SEO/GEO concepts, not paper terminology); the editorial inference that individual spoke pages (not just the pillar) need to be cite-ready is a glossary extension of the paper's source-level findings, not a paper conclusion. Counter-evidence: a 2025 follow-up benchmark3 tested 7 of these 9 methods in multi-actor production-realistic conditions and found most largely ineffective or slightly negative on citation ranking; the 2023 PAWC effect sizes remain valid for the single-actor synthetic testbed but set an empirical upper bound, not a production prediction. ↩
-
See the C-SEO Bench glossary entry for the full paper attribution (Puerto, Gubri, Green, Oh, Yun. "C-SEO Bench: Does Conversational SEO Work?" arXiv:2506.11097, NeurIPS 2025 Datasets & Benchmarks Track), method-by-method results, multi-actor evaluation methodology, and the full verbatim findings. ↩
Part of Search foundations· editorial cluster, not a semantic link
Also in this cluster: AI Overview · Answer block · Authority signals · E-E-A-T (AI search context) · Entity-based SEO · +5 more
Related terms
FAQ
- How is a topic cluster different from pillar content?
- Pillar content is the hub page at the center of the cluster; the topic cluster is the full hub-and-spoke structure (pillar + spokes + their interlinking). The terms are often used interchangeably in marketing copy, but the pillar is one component of a cluster.
- How many spoke pages per pillar?
- Practitioners commonly land mature clusters in the 8–20 spoke range, but this is a heuristic; no engine has published a target, and the right number depends on how much real user-question surface the topic has. Quality matters more than count. Each spoke should answer a real user query independently, not exist only to support the pillar.
- Is the topic cluster pattern still useful in AI search?
- Yes, with a shifted value proposition. In classical SEO it concentrated PageRank on the pillar. In AI search the value shifts to topical clarity, internal discoverability, and passage-level usefulness; a well-built cluster may improve the likelihood that engines retrieve and cite relevant pillar or spoke pages for the topic. Whether the cluster structure itself (vs the cluster-level content quality) independently lifts citation rates has not been isolated by public study; measure directly per the methodology below rather than assuming the lift.
Sources & further reading
Get the monthly digest
New terms shipped that week, plus one observation from the AI-citation tracker.