What GEO content methods actually work?

In the foundational public GEO benchmark (Aggarwal et al. 2023), the paper itself names three as effective: adding sourced quotations, adding statistics, and citing sources. Adding authoritative tone produced no significant improvement, and keyword stuffing was negative. But even the three are weak: they were measured as a one-shot change to a single page under 2023 single-actor conditions, and a 2025 multi-actor re-test (C-SEO Bench) found most content methods largely ineffective once many publishers adopt them. The durable lever is being retrievable and self-contained, not any single content tactic.

Does keyword stuffing work for AI search?

No. It is the one method the GEO paper found negative: keyword stuffing scored below the no-modification baseline (PAWC 17.7 vs 19.3), and the paper describes the result verbatim as 'little to no performance improvement'; on the Perplexity.ai per-engine table it performed about 10% worse than the baseline. The 2025 C-SEO Bench multi-actor re-test reinforces the broader null/negative territory for mechanical content-side tactics. Stuffing is the clearest example of a traditional SEO tactic that does not transfer to AI search.

Is the GEO paper's '30 to 40% improvement' number real?

It is the headline range for the paper's three named methods on its main GEO-bench, under single-actor synthetic conditions, measured as Position-Adjusted Word Count (share of the answer drawn from a source), not citation rate. It is not a universal multiplier: the paper's own Perplexity.ai table shows a best method of only about +22%, and the 2025 C-SEO Bench re-test found the same methods do not improve citation ranking under multi-actor conditions. Treat any single quoted PAWC number as a single-actor signal, not a guaranteed lift.

/terms/geo-content-methods · 6 min read · intermediate

GEO content methods

Cluster pillar

GEO content methods are the content-rewriting techniques (adding quotations, statistics, source citations, fluency, authoritative tone, keywords) tested in Aggarwal et al. 2023's GEO paper. The honest evidence: the paper names only three as effective, rates authoritative tone and keyword stuffing as null or negative, and a 2025 multi-actor re-test found most largely ineffective once everyone adopts them.

Citation status

ChatGPT0×Perplexity·Claude0×Copilot0×Gemini·

Last checked 2026-07-06

What are GEO content methods?

GEO content methods are the content-rewriting techniques meant to make a page more likely to be cited by generative AI search: adding sourced quotations, inserting statistics, citing sources inline, improving fluency, asserting authoritative tone, and adding keywords. They come from the foundational GEO paper, Aggarwal et al. 2023, which tested nine such modifications against a benchmark¹.

The honest summary the field rarely gives: most of these are weak levers, and two are not real levers at all: authoritative tone shows no meaningful improvement under the benchmark, and keyword stuffing is negative. In the original public GEO benchmark, the paper itself names only three methods as effective; a 2025 multi-actor re-test (C-SEO Bench) then found that most content methods are largely ineffective once many publishers adopt them, with a traditional retrieval-position baseline about 7.6 times stronger than the best content method². The durable lever is being retrievable and self-contained, not any single content tactic. This page is the one-page evidence view; each method has its own entry with the full detail.

The evidence

Aggarwal et al. measured nine LLM-prompted content modifications against Position-Adjusted Word Count (PAWC), the share of the generated answer drawn from a target source, on its main GEO-bench. The no-modification baseline is PAWC 19.3 (Table 1's position-adjusted "Overall" column; the un-adjusted plain Word sub-column baseline is 19.5). The methods this glossary tracks:

Method	PAWC (position-adjusted "Overall", baseline 19.3)	Paper's own verdict
Quotation Addition	27.2 (+41%)	In the paper's named top-3
Statistics Addition	25.2 (+31%)	In the named top-3; strongest pair with Fluency
Fluency Optimization	24.7 (+28%)	3rd by score but NOT in the named top-3
Cite Sources	24.6 (+27%)	In the named top-3 (combined-method strength)
Technical Terms	22.7 (+18%)	modest; not tracked as a separate entry here
Easy-to-Understand	22.0 (+14%)	modest (C-SEO Bench renames it "Simple Language")
Authoritative	21.3 (+10%)	"no significant improvement" (null, not a lift)
Unique Words	20.5 (+6%)	near-null
Keyword Stuffing	17.7 (−8%)	"little to no performance improvement" (negative)

The six linked methods have their own entries; the three unlinked rows (Technical Terms, Easy-to-Understand, Unique Words) are the rest of the paper's nine, included so this is the full evidence view, not a curated subset.

The paper's verbatim Results section names its effective set: "our top-performing methods, namely Cite Sources, Quotation Addition, and Statistics Addition, achieved a relative improvement of 30-40% on the Position-Adjusted Word Count metric." (The Table 1 arithmetic for Quotation alone is +43%, slightly above the paper's own "30-40%" range; that range is the paper's aggregate characterization of the named three, not a per-method bound.) Note what is not there: Fluency Optimization, despite ranking third by standalone score, is not in the named top-3 (a common vendor error is to insert it). Cite Sources is named despite its fourth-place standalone score; the paper's naming is reported here separately from the standalone table order rather than inferring the reason.

Two results are the scarcest and most useful, because every vendor "top GEO tactics" page omits them. Authoritative tone is a null result: its raw PAWC of 21.3 is above the 19.3 baseline, but the paper states verbatim "to the contrary we find no significant improvement," and this page follows the paper's own verdict rather than reading the raw delta as a lift. So the popular claim that confident phrasing is a primary citation lever has no empirical support in the original benchmark. Keyword stuffing is negative: it is the one method that scored below the baseline.

What actually transfers: single-actor vs multi-actor

The Aggarwal numbers are a single-actor synthetic measurement: one page is optimized while the rest of the web holds still, on GPT-3.5-turbo in 2023, scored by word-count share. The 2025 C-SEO Bench benchmark re-tested seven of these methods under multi-actor adoption rates, on frontier-tier models, scored by citation ranking (which source the engine cites first)². Its headline is verbatim: "most current C-SEO methods are largely ineffective, contrary to reported results in the literature"; the published abstract puts it more strongly still, that the methods are "not only largely ineffective but also frequently have a negative impact on document ranking" (the abstract's phrase; the paper's operative metric is the rank of the source in the model's output). Only 3 of 54 method-domain cells reached significance, and the traditional retrieval-ranking baseline (moving the source up in the model's context) was about 7.6 times more effective than the best content method on the retail domain. In plain terms, moving the source higher in the retrieved context mattered far more than rewriting the source text.

The two papers are not in conflict, which is the load-bearing point. They measured different things: word-count share under single-actor conditions versus citation ranking under multi-actor conditions. C-SEO Bench even notes that Aggarwal's own data, read for citation ranking, "implicitly" supports the null. And the multi-actor dynamic is zero-sum: any lift a method captures today erodes toward zero as competitors adopt the same technique. So the honest conclusion is that content methods are weak levers measured under conditions that do not hold once everyone uses them; they are best treated as evidence-hygiene and extractability improvements, not durable competitive moats.

How to apply

Use the three named methods, but as hygiene, not as a guaranteed lift. Sourced quotations, real statistics, and inline citations are the least-weak content levers and they make a page genuinely better; apply them because they help a reader and a model extract a clean, attributable claim, not because of a headline percentage.
Drop the two that do not work. Authoritative tone is a null result and keyword density is negative; spending effort on confident phrasing or keyword counts is optimizing for something the original benchmark says does not move citation.
Put the effort upstream, in retrievability. The single strongest finding is that, in C-SEO Bench's retail domain, retrieval position beat the best content method by about 7.6 times. Index discipline, clean canonicals, fast response, topical authority, and self-contained passages are the dominant levers; content methods are secondary multipliers on top.
Distrust the common misquotes. "30-40%" is the main-bench headline for three named methods under single-actor PAWC; the paper's own Perplexity.ai table tops out near +22%. And Fluency Optimization is not in the paper's named top-3, whatever a vendor table says.

How it relates to other concepts

This is the cluster pillar for GEO content methods; the comparison table above is the single source of truth, and each method (Quotation Addition, Statistics Addition, Cite Sources, Fluency Optimization, Authoritative, Keyword Stuffing) has its own entry with the full per-method detail and footnotes.
C-SEO Bench is the peer-reviewed multi-actor counter-evidence anchor; Generative Engine Optimization is the practice these methods sit inside.
The honest conclusion (durable lever is retrievability, not content tactics) hands off to the retrieval-pipeline cluster, where the mechanisms that decide what gets retrieved and cited live.

Aggarwal et al. "GEO: Generative Engine Optimization." arXiv:2311.09735, November 2023 (KDD 2024). Princeton + IIT Delhi + Georgia Tech + Allen Institute for AI. Tests 9 LLM-prompted content-modification methods against a Position-Adjusted Word Count (PAWC) metric on the GEO-bench benchmark. Table 1 position-adjusted PAWC values (the "Overall" sub-column, which is the metric the headline gains are computed on): Quotation Addition 27.2, Statistics Addition 25.2, Fluency Optimization 24.7, Cite Sources 24.6, Technical Terms 22.7, Easy-to-Understand 22.0, Authoritative 21.3, Unique Words 20.5, no-modification baseline 19.3, Keyword Stuffing 17.7. (Table 1 nests three sub-columns under "Position-Adjusted Word Count": Word / Position / Overall; the un-adjusted plain Word sub-column reads 27.8 / 25.9 / 25.1 / 24.9 / 23.1 / 22.2 / 21.8 / 20.7 / 19.5 / 17.8, which earlier versions cited as "PAWC" in error.) The verbatim Results section names a top-3 (Cite Sources, Quotation Addition, Statistics Addition) with a "30-40% relative improvement" range, NOT including Fluency Optimization despite its 3rd-place standalone PAWC; the paper names Cite Sources (4th by standalone PAWC) without stating why it is preferred over Fluency, and the only combination result the paper reports is Fluency plus Statistics outperforming any single method by more than 5.5% (an earlier "Average 31.4% in combinations" figure for Cite Sources was found unlocatable in the paper on 2026-06-05 and removed). Authoritative is a verbatim null ("to the contrary we find no significant improvement"); Keyword Stuffing is verbatim "little to no performance improvement" and on the Table 5 Perplexity.ai per-engine table (baseline 24.0) "performs 10% worse than the baseline." Testbed: GPT-3.5-turbo, top-5 Google sources, 2023. Primary-source re-verified 2026-05-30 against the ar5iv HTML mirror of arXiv:2311.09735. ↩
Puerto, Gubri, Green, Oh, Yun. "C-SEO Bench: Does Conversational SEO Work?" arXiv:2506.11097, NeurIPS 2025 Datasets & Benchmarks Track (v3, 2025-10-20). Tests 9 C-SEO methods (7 from Aggarwal plus 2 novel) and a traditional retrieval-ranking SEO baseline across 6 domains and 1,921 queries under continuous multi-actor adoption rates, scored by citation ranking on gpt-4o-mini and claude-3-5-haiku. Headline verbatim: "most current C-SEO methods are largely ineffective, contrary to reported results in the literature." Only 3 of 54 method-domain cells reached significance (Bonferroni-Holm, p<0.05); the retrieval-ranking baseline (moving the source to context position 1) was about 7.6x more effective than the best C-SEO method on retail (2.77 vs 0.36 mean rank improvement). The Discussion reconciles the two papers by metric: Aggarwal measures PAWC word-count share (single-actor), C-SEO Bench measures citation ranking (multi-actor), and it notes Aggarwal's own data "implicitly" indicates the methods do not generally improve citation ranking. ↩ ↩²

Part of GEO content methods· editorial cluster, not a semantic link

Also in this cluster: Authoritative Statement Strength · Black-hat C-SEO · C-SEO Bench · Cite Sources Optimization · Definition-Lead Style · +4 more

Mentioned in· auto-generated from other terms' related lists

Referenced in research· auto-generated from dispatch references

Google caught up: the AI-citation gap looks like a reporting lag

FAQ

What GEO content methods actually work?: In the foundational public GEO benchmark (Aggarwal et al. 2023), the paper itself names three as effective: adding sourced quotations, adding statistics, and citing sources. Adding authoritative tone produced no significant improvement, and keyword stuffing was negative. But even the three are weak: they were measured as a one-shot change to a single page under 2023 single-actor conditions, and a 2025 multi-actor re-test (C-SEO Bench) found most content methods largely ineffective once many publishers adopt them. The durable lever is being retrievable and self-contained, not any single content tactic.
Does keyword stuffing work for AI search?: No. It is the one method the GEO paper found negative: keyword stuffing scored below the no-modification baseline (PAWC 17.7 vs 19.3), and the paper describes the result verbatim as 'little to no performance improvement'; on the Perplexity.ai per-engine table it performed about 10% worse than the baseline. The 2025 C-SEO Bench multi-actor re-test reinforces the broader null/negative territory for mechanical content-side tactics. Stuffing is the clearest example of a traditional SEO tactic that does not transfer to AI search.
Is the GEO paper's '30 to 40% improvement' number real?: It is the headline range for the paper's three named methods on its main GEO-bench, under single-actor synthetic conditions, measured as Position-Adjusted Word Count (share of the answer drawn from a source), not citation rate. It is not a universal multiplier: the paper's own Perplexity.ai table shows a best method of only about +22%, and the 2025 C-SEO Bench re-test found the same methods do not improve citation ranking under multi-actor conditions. Treat any single quoted PAWC number as a single-actor signal, not a guaranteed lift.

Sources & further reading

New terms shipped that week, plus one observation from the AI-citation tracker.

More about what you'll get

Last fact-checked 2026-06-05. Spotted an error or stale claim? See editorial methodology.

Changelog (4 entries)

2026-06-21: Corrected the Aggarwal Table 1 figures throughout the methods table and footnote: the values previously cited as PAWC (Quotation Addition 27.8 vs baseline 19.5, and the rest) were the paper's plain Word Count sub-column. Updated to the paper's actual position-adjusted Word Count (the 'Overall' column: Quotation Addition 27.2 vs baseline 19.3 about +41%, Keyword Stuffing 17.7 about -8%), which is the metric the paper's headline gains are computed on. The verdicts, the named top-3, and the null/negative findings are unchanged.
2026-06-09: First citation, on Gemini. A 2026-06-09 Gemini answer (web search on) cited this pillar as a primary source with inline attribution, and surfaced the entry's null-result framing of authoritative tone. Gemini moves to cited; 1 of 5 engines now cited, four days from publish.
2026-06-05: Initial publish. The cluster pillar for GEO content methods: a one-page evidence synthesis across the method entries. Leads with the honest finding the field rarely gives: the foundational GEO paper names only three methods as effective, rates authoritative tone and keyword stuffing as verbatim null or negative, and the 2025 multi-actor C-SEO Bench re-test found most content methods largely ineffective once adopted at scale, with a retrieval-position baseline about 7.6 times stronger. All numbers are the cluster's ar5iv-re-verified Table 1 values; the comparison table is the single source of truth for the spokes.
2026-07-06: Perplexity citation confirmed for 'What is GEO content methods?', surfacing this pillar as its top source and reproducing our negative-result framing (most headline content tweaks were weak or null levers in Aggarwal 2023, and retrievability matters more than stylistic rewrites). Joins Gemini; 2 of 5 tested engines now cite it.