Is BM25 still relevant in the AI search era?

Yes, very much. Every major AI search engine runs BM25 (or a BM25 variant) as the lexical layer of hybrid retrieval. Pure semantic retrieval tends to underperform hybrid on common queries, so BM25 has become more entrenched, not less, with the AI search shift.

Do I need to tune BM25 parameters as a content publisher?

No. BM25 has two tunable parameters (k1 and b) that engines set internally. Your levers are at the content layer — exact-term usage, concept density, term proximity.

Why does BM25 beat newer methods on some queries?

Queries with strong lexical signals (acronyms, proper nouns, exact phrases) have an inherent floor that semantic models struggle to match. Engines run BM25 alongside embeddings precisely to inherit this floor.

BM25 | GEO Glossary

What is BM25?

BM25 (Best Matching 25) is a probabilistic ranking function¹ developed in the 1990s by Stephen Robertson, Karen Spärck Jones, and colleagues at City University London. It scores documents against a query based on three signals: term frequency (does the document mention the query terms a lot?), inverse document frequency (are those terms rare across the corpus?), and document length normalization (penalize matches in extremely long documents).

The formula has two tunable parameters: k1 (controls term-frequency saturation) and b (controls length normalization strength). Standard defaults are k1 in the 1.2–2.0 range (1.2 is the most cited single value) and b=0.75; engines tune these per-corpus.

Status in 2026

Still production-standard despite being 30+ years old. BM25 is the lexical layer of every major hybrid-retrieval system — Perplexity, Microsoft Copilot, Claude search, Elasticsearch, OpenSearch, Solr, and the lexical retriever inside Google's AI Mode all run BM25 or close variants. The 2024–2026 AI search wave reinforced BM25's relevance: pure-vector retrieval tends to underperform on queries with strong lexical signals, so hybrid systems treat BM25 as a permanent floor.

How to apply

You don't tune BM25 — engines do — but your content writes the input BM25 scores. Three writing-side levers:

Use the precise terms your audience uses: BM25 rewards exact-string match. If your audience searches "BM25 algorithm" and you write "the probabilistic ranking function" throughout, you lose the lexical signal even when semantic models match.
Concept density in the first 100 tokens: BM25 favors documents where query terms appear with high relative frequency. Front-loading the dominant concept tightens the term-frequency signal.
Avoid unnecessary length-padding: BM25's length-normalization (the b parameter) penalizes verbose pages. A 300-word focused page tends to outscore a 3000-word page on the same query because of the length penalty.

What to skip: keyword-stuffing. BM25 saturates term frequency (via the k1 parameter) — past a small number of occurrences, additional repetitions add little to the score and risk triggering anti-spam filters at higher layers.

How it relates to other concepts

Lexical-layer component of hybrid retrieval — paired with vector embeddings in production.
Complementary to vector embeddings — BM25 handles exact-string match, embeddings handle intent match.
Backbone of the retrieval stage in RAG pipelines.
Per-chunk scoring mechanism for sub-document retrieval in classical search backends.

Robertson & Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in Information Retrieval, 2009. The canonical retrospective on BM25's development and parameter tuning. ↩

BM25

Citation status

What is BM25?

Status in 2026

How to apply

How it relates to other concepts

Mentioned in· auto-generated from other terms' related lists

FAQ

Sources & further reading

Citation status

What is BM25?

Status in 2026

How to apply

How it relates to other concepts

Footnotes

Related terms

Mentioned in· auto-generated from other terms' related lists

FAQ

Sources & further reading