Does adding random statistics increase citation?

Only when the statistics are relevant, accurate, sourced, and non-redundant. Number-stuffing is a stuffing pattern, the same anti-pattern keyword-stuffing was for classic SEO. The Aggarwal et al. 2023 paper tested adding *naturally fitting* quantitative statistics; fabricated, off-topic, decorative, or stale statistics do not earn the same effect. (The paper did not test what happens when fabricated stats are added; the claim that engines actively downweight conflicting stats is a plausible inference, not a paper finding.)

What is a good statistical density target?

The Aggarwal et al. 2023 paper does NOT define a specific ratio. What the paper actually did was apply an LLM-prompted method called Statistics Addition to source content and measure the resulting change in Position-Adjusted Word Count, the paper's citation visibility metric. Statistics Addition raised PAWC from a baseline of 19.3 to 25.2 (~31%) in their experiment. Practitioners interpret this as 'add accurate, sourced statistics where they naturally fit', not as a specific per-block ratio target.

Should I cite the source of each statistic?

Yes. The same Aggarwal et al. paper found Cite Sources (PAWC 24.6, ~27% above baseline) was among the top-performing methods alongside Statistics Addition. Combining numerical content with attribution is empirically the strongest pattern the paper tested at the source-content layer.

Which numbers should count as 'statistics' and which should not?

Tightest reading: verifiable quantitative claims with a source and ideally a date. Sourced statistics (38% of cited pages came from top 10 per Ahrefs 2026), dollar amounts ($10B market per Source X), explicit ranks/proportions (top 10 of 100), and dated events (May 7, 2026) all count. Decorative numbers (5 tips, 7 ways), vague big-number claims (millions of users, no source), and word-count quotas (300-word answer block) do not count toward the empirical evidence pattern the paper studied.

/terms/statistical-density · 6 min read · intermediate

Statistical Density

Statistical density is a practitioner-coined shorthand for the content property that the Aggarwal et al. 2023 GEO paper's 'Statistics Addition' method tries to increase: presence of verifiable statistics, dates, and numerical claims. The term itself and any specific ratio definition are practitioner-derived, not paper measurements.

Citation status

ChatGPTPerplexityClaude·CopilotGemini

Last checked 2026-05-30

Statistical density is a practitioner-coined shorthand for the content property that the Aggarwal et al. 2023 GEO paper's "Statistics Addition" method tries to increase: presence of verifiable statistics, dates, and numerical claims¹. The term itself is not defined in the paper, and any specific ratio definition (such as "numerical statements / total sentences") is post-hoc practitioner framing, not a paper measurement.

What the paper actually did, in one sentence: it applied an LLM-prompted content-modification method called Statistics Addition to source content ("modifies content to include quantitative statistics instead of qualitative discussion, wherever possible") and measured the change in Position-Adjusted Word Count (PAWC), the paper's primary citation-visibility metric in generative engine responses. Statistics Addition raised PAWC from a baseline of 19.3 to 25.2 (~31% relative gain) across the paper's evaluation queries.

This is a method-based intervention finding, not a property-based correlation. The paper did NOT measure how naturally statistic-dense content compares to naturally qualitative content. It measured what happens when you actively rewrite content to include more statistics. The distinction matters: the paper supports "actively edit to add statistics where naturally relevant," not "naturally write content with high statistic density and citation will follow."

The paper offers no mechanistic explanation for why adding statistics improves citation. One plausible inference (editorial, not from the paper): AI engines may prefer numerical claims because they are easier to ground against retrieved sources than qualitative assertions. Treat this as speculation, not a paper finding.

Status in 2026

Widely-referenced as a practitioner concept; the paper-grounded version (Statistics Addition as an intervention) is rarely cited correctly. Most published guides treat "statistical density" as if the paper defined it as a measurable content property with a target ratio; the paper did neither.

The original paper tested 9 GEO methods at the source-content layer. By PAWC, the top 4 were Quotation Addition (PAWC 27.2, ~41%), Statistics Addition (25.2, ~31%), Fluency Optimization (24.7, ~28%), and Cite Sources (24.6, ~27%); Statistics Addition ranked 2nd of 9. The paper's §5.3 (Figure 4 heatmap, 200-example subset) reports that the combination of Fluency Optimization + Statistics Addition outperformed any single GEO method by more than 5.5%, the strongest of the pairwise combinations measured.

Whether the 25.2 PAWC result for Statistics Addition reproduces on 2026 commercial AI engines (ChatGPT-5, Perplexity, Claude, Copilot, Gemini) has not been isolated by public study, and the gap between the paper's intervention metric and any practitioner ratio measure remains untested by direct evidence. The score is experimental signal under the paper's specific testbed, not a citation-rate multiplier for any current engine.

Working assumption (in the absence of direct 2026 measurement): treat statistical density as the content property that the paper-tested Statistics Addition intervention increases, not as a standalone citation lever. Adopt sourced numerical specificity for editorial credibility and reader scanability; expect any citation lift to be measurable only through your own testing.

The number-stuffing risk is the main 2026 practitioner failure mode. Several SEO blogs frame statistical density as "hard data points every 150-300 words" without distinguishing relevant sourced statistics from decorative numbers; this risks becoming the AI-search-era analog of keyword stuffing. The paper's finding does not support that framing.

Counter-evidence (C-SEO Bench 2025): A multi-actor follow-up benchmark² specifically tested Statistics Addition (relabelled "Statistics" in the paper) across 6 domains. The Statistics method showed -0.07 ±1.00 rank change on retail under multi-actor adoption (slightly negative), not the ~31% PAWC lift seen on the 2023 benchmark. The C-SEO Bench result does not invalidate Aggarwal's PAWC 25.2 figure (that remains valid for the 2023 single-actor synthetic testbed), but it sets a striking empirical upper bound under citation-ranking conditions. Of all 5 directly-tested Aggarwal methods, the Statistics result is the strongest single-number counter-evidence the benchmark surfaced. See C-SEO Bench for the full multi-actor methodology, the comparison to traditional retrieval-ranking SEO, and the zero-sum framing.

How to apply

The Aggarwal et al. paper supports active editing to add relevant, sourced statistics where they naturally fit. It does not support adding statistics for their own sake or hitting a specific density ratio. The practical writing rule:

Wherever a sentence has a "many" / "most" / "often" qualifier, try replacing it with a sourced number that genuinely supports the claim: "many SaaS companies struggle with X" → "a 2024 Gartner survey found 67% of SaaS CMOs cite X as the top challenge." The full pattern is number + source + date + context, not the number alone.
Front-load the number, then the source: AI engines often quote the first 1-2 sentences of a paragraph more than the body. Lead with the statistic, then the citation, then the elaboration, not the other way around. Example: "In Ahrefs' March 2026 study of 863K SERPs, 37.9% of Google AI Overview cited pages also ranked in the organic top 10³."
Audit existing pages for "qualifier hot spots": search your content for "many", "most", "often", "some", "several". Each is an opportunity to add a sourced number, but only where a real number exists; do not invent numbers to fill the slot.
Pair Statistics Addition with Fluency Optimization where possible: the paper found this combination outperforms any single GEO method by >5.5%. Practically: when you add a sourced statistic, also tighten the surrounding prose for clarity. A clearly-written sentence with a sourced number is more cite-able than either alone.

What to skip:

Padding pages with statistics that do not support a specific claim. Decorative numbers ("5 tips", "7 ways") and vague big-number claims ("millions of users" without source) do not count toward the empirical evidence pattern the paper studied.
Targeting a specific density ratio (such as "1 stat per 150 words"). The paper did not test a ratio; SEO blogs that report ratios are extrapolating.
Adding inaccurate, off-topic, or stale statistics. The paper did not measure this case directly, but a plausible inference is that AI engines retrieving from a corpus may downweight claims that conflict with other retrieved sources. Treat this as practitioner caution, not a paper finding.

How it relates to other concepts

The underlying intervention is one of nine GEO methods tested in Aggarwal et al. 2023. The paper's top 4 by PAWC (Quotation Addition, Statistics Addition, Fluency Optimization, Cite Sources) cover the strongest content-level interventions isolated under its test conditions. Statistics Addition is the second-strongest single method and the second half of the paper's strongest measured pair (with Fluency Optimization; see that entry for the combined-intervention details).
Distinct from Authoritative Statement Strength, which was a separately tested method (PAWC 21.3, +10%) in the same paper. Sourced numerical specificity and authoritative voice are different interventions.
Input signal for cite-ability: a passage with sourced statistics is generally easier to extract and quote than a passage with vague qualifiers.
Tactical implementation tied to passage-level optimization: statistics distribute across passages and are best placed in the lead of each.
Companion writer discipline to Definition-Lead Style: a definition-lead opener carrying a sourced statistic is both extractable and quantitatively grounded; the pattern compounds at the answer-block level.
A differentiator from content marketing prose, which is often qualitatively framed; but the differentiator is the sourced specificity of statistics, not the raw count.

Aggarwal et al. "GEO: Generative Engine Optimization." arXiv:2311.09735, November 2023 (KDD 2024). Princeton + IIT Delhi + Georgia Tech + Allen Institute for AI. The paper tested 9 LLM-prompted content-modification methods against a Position-Adjusted Word Count (PAWC) visibility metric on the GEO-bench benchmark. Table 1 position-adjusted PAWC values (the "Overall" sub-column, which is the metric the headline gains are computed on): Quotation Addition 27.2, Statistics Addition 25.2, Fluency Optimization 24.7, Cite Sources 24.6, Technical Terms 22.7, Easy-to-Understand 22.0, Authoritative 21.3, Unique Words 20.5, no-modification baseline 19.3, Keyword Stuffing 17.7. (Table 1 nests three sub-columns under "Position-Adjusted Word Count": Word / Position / Overall; the un-adjusted plain Word sub-column reads 27.8 / 25.9 / 25.1 / 24.9 / 23.1 / 22.2 / 21.8 / 20.7 / 19.5 / 17.8, which earlier versions of this entry cited as "PAWC" in error.) The paper's verbatim Results section names a top-3 (Cite Sources, Quotation Addition, Statistics Addition) with a "30-40% relative improvement" range; standalone Table 1 PAWC ranking adds Fluency Optimization as the 3rd highest standalone score, but the paper does not include Fluency in its named top-3 (Cite Sources appears there for combined-method strength despite ranking 4th standalone). Per-engine results vary: Table 5 (Perplexity.ai) reports a different baseline of 24.0 and Quotation Addition at +22%, not the main bench's 30-40% range. For Statistics Addition specifically: ranked #2 of 9 by standalone PAWC AND included in the paper's verbatim named top-3; also the second half of the paper's only isolated strongest combination pair (with Fluency Optimization, more than 5.5% over any single method, §5.3 Figure 4 heatmap on 200-example subset). The paper applies it as an LLM-prompted intervention adding numerical specificity to source content; this glossary entry's "Statistical Density" name reframes the finding as the content property the intervention increases, not as a stand-alone ratio metric the paper measured. The paper does NOT define a "statistical density" metric, does NOT prescribe a ratio of numerical statements per sentence, and does NOT offer a mechanistic explanation for why statistics help. Testbed: GPT-3.5-turbo, top-5 Google sources, 2023. Primary-source re-verified 2026-05-30 against the ar5iv HTML mirror of arXiv:2311.09735: all Table 1 PAWC values, Table 1 caption verbatim, Section 4 prose, the verbatim named top-3 quote, and Table 5 Perplexity.ai per-engine numbers (including Keyword Stuffing 21.9 with paper prose 'performs 10% worse than the baseline') confirmed. ↩
See the C-SEO Bench glossary entry for the full paper attribution (Puerto, Gubri, Green, Oh, Yun. "C-SEO Bench: Does Conversational SEO Work?" arXiv:2506.11097, NeurIPS 2025 Datasets & Benchmarks Track), method-by-method results, multi-actor evaluation methodology, and the full verbatim findings. ↩
Ahrefs blog, "Update: 38% of AI Overview Citations Pull From the Top 10." ahrefs.com/blog/ai-overview-citations-top-10, March 2, 2026. Study covered 4M URLs across 863K SERPs (Feb-Mar 2026): AI Overview citations distribute 37.9% top-10 organic / 31.2% positions 11-100 / 31.0% beyond top-100. An earlier Ahrefs analysis had reported a higher top-10 share (~76%); the revised lower figure is attributed to improved parsing methodology + query fan-out shift since the prior study. ↩

Part of GEO content methods· editorial cluster, not a semantic link

Cluster pillar: GEO content methods→

Also in this cluster: Authoritative Statement Strength · Black-hat C-SEO · C-SEO Bench · Cite Sources Optimization · Definition-Lead Style · +4 more

Mentioned in· auto-generated from other terms' related lists

Referenced in research· auto-generated from dispatch references

GEO's most-cited numbers, checked against the papers they come from

FAQ

Does adding random statistics increase citation?: Only when the statistics are relevant, accurate, sourced, and non-redundant. Number-stuffing is a stuffing pattern, the same anti-pattern keyword-stuffing was for classic SEO. The Aggarwal et al. 2023 paper tested adding *naturally fitting* quantitative statistics; fabricated, off-topic, decorative, or stale statistics do not earn the same effect. (The paper did not test what happens when fabricated stats are added; the claim that engines actively downweight conflicting stats is a plausible inference, not a paper finding.)
What is a good statistical density target?: The Aggarwal et al. 2023 paper does NOT define a specific ratio. What the paper actually did was apply an LLM-prompted method called Statistics Addition to source content and measure the resulting change in Position-Adjusted Word Count, the paper's citation visibility metric. Statistics Addition raised PAWC from a baseline of 19.3 to 25.2 (~31%) in their experiment. Practitioners interpret this as 'add accurate, sourced statistics where they naturally fit', not as a specific per-block ratio target.
Should I cite the source of each statistic?: Yes. The same Aggarwal et al. paper found Cite Sources (PAWC 24.6, ~27% above baseline) was among the top-performing methods alongside Statistics Addition. Combining numerical content with attribution is empirically the strongest pattern the paper tested at the source-content layer.
Which numbers should count as 'statistics' and which should not?: Tightest reading: verifiable quantitative claims with a source and ideally a date. Sourced statistics (38% of cited pages came from top 10 per Ahrefs 2026), dollar amounts ($10B market per Source X), explicit ranks/proportions (top 10 of 100), and dated events (May 7, 2026) all count. Decorative numbers (5 tips, 7 ways), vague big-number claims (millions of users, no source), and word-count quotas (300-word answer block) do not count toward the empirical evidence pattern the paper studied.

Sources & further reading

New terms shipped that week, plus one observation from the AI-citation tracker.

More about what you'll get

Last fact-checked 2026-05-30. Spotted an error or stale claim? See editorial methodology.

Changelog (9 entries)

2026-06-21: Corrected the Aggarwal Table 1 figures: the values previously cited as PAWC (Statistics Addition 25.9 vs baseline 19.5, and the rest) were the paper's plain Word Count sub-column. Updated to the paper's actual position-adjusted Word Count (the 'Overall' column: Statistics Addition 25.2 vs baseline 19.3, about +31%; Quotation Addition 27.2 about +41%; Keyword Stuffing 17.7 about -8%), which is the metric the paper's headline gains are computed on. Rankings and the named top-3 framing are unchanged (Statistics Addition still 2nd by standalone score).
2026-05-30: Primary-source re-verification of the cluster's shared Aggarwal Table 1 PAWC numbers + verbatim quotes against the ar5iv mirror of arXiv:2311.09735. All Table 1 PAWC values, Table 1 caption verbatim, Section 4 prose, the named top-3 quote, and Table 5 Perplexity per-engine numbers (including Keyword Stuffing 21.9 with paper prose 'performs 10% worse than the baseline') confirmed. Aggarwal footnote appended with the re-verification note; no body changes.
2026-05-30: PAWC labeling sweep. Aggarwal footnote now explicitly labels values as 'Table 1 main GEO-bench' and includes all 9 + baseline. Surfaces the paper's verbatim named top-3 (Cite Sources, Quotation Addition, Statistics Addition) at the 30-40% range, distinct from the standalone PAWC ranking that placed Fluency in the cluster's prior 'top-4' framing. Statistics Addition is in both rankings. Table 5 (Perplexity.ai) per-engine caveat added (different baseline 24.0, best method +22%).
2026-05-23: Cluster template alignment. Corrected the top-PAWC ranking (Top 3 by PAWC is Quotation Addition / Statistics Addition / Fluency Optimization, Cite Sources #4; prior text swapped Cite Sources and Fluency). Fixed 'best single intervention' wording (a combination is not single) and tightened the §5.3 reference (Figure 4, 200-example subset). Added the 2026-commercial-engines replication caveat. Inline-linked Quotation Addition, Cite Sources, and Fluency. Ahrefs example corrected to 37.9%. relatedTerms expanded 3 to 9. Sources block synced to all four Aggarwal institutions.
2026-05-21: First confirmed external Google-search click landed on this entry (2026-05-21, Singapore edge, Google referrer; all five traffic-attribution axes pass). The third practitioner-coined anchor to attract a confirmed external Google click (after cite-ability and citation-share), making the pattern multi-entry: practitioner-coined empty-territory terms reliably attract organic Google clicks. Recorded as the fifth confirmed external visit.
2026-05-16: Reframed 'statistical density' as a practitioner-coined shorthand rather than a metric defined in the Aggarwal et al. 2023 paper. The paper actually tests a content edit called Statistics Addition, measured against the Position-Adjusted Word Count (PAWC) metric, not a sentence-level density ratio. Page rewritten to use correct PAWC numbers, distinguish intervention vs correlation, and add anti-stuffing discipline (relevant + sourced + non-redundant statistics only).
2026-05-13: Initial publish
2026-05-30: Claude citation confirmed for the Aggarwal vendor-research query ('What did Aggarwal 2023's GEO paper say about Statistics Addition as a content method?'). The entry surfaced as a primary cited source in Claude's desktop answer; it also surfaced as the cited source for the parallel Cite Sources Aggarwal query, a cross-entry signal.
2026-06-05: Level reclassified advanced->intermediate under the published level rubric (prerequisite-knowledge depth): it is a practitioner shorthand for a simple content property (adding quantitative specificity), readable from the foundational layer rather than an advanced-prerequisite topic.