AI crawler bots

What is an AI crawler bot?

A user-agent string plus IP signature operated by an AI search engine. Each major AI engine runs at least one crawler bot — often multiple bots for different purposes: training corpus collection, real-time retrieval, and user-initiated browsing. Site owners control bot access through directives in robots.txt.

Status in 2026 — known AI crawlers

Operator	Training	Retrieval	User-initiated
OpenAI	`GPTBot`	`OAI-SearchBot`	`ChatGPT-User`
Perplexity	`PerplexityBot`	`PerplexityBot`	`Perplexity-User`
Anthropic	`ClaudeBot`	`Claude-SearchBot`	`Claude-User`
Google	`Google-Extended` (training opt-out)	`Googlebot` (shared with classic)	—
Microsoft	`Bingbot` (shared with classic)	`Bingbot`	—
Apple	`Applebot-Extended` (opt-out signal)	`Applebot` (general/search)	—
ByteDance	`Bytespider`	(not separately disclosed)	—

Google-Extended and Applebot-Extended are unusual — they are training opt-out signals, not separate crawlers. Allowing them is the default; blocking removes your content from future model training but does not affect classic search indexing.

How it relates to other concepts

Direct mechanism behind Generative Engine Optimization — block these bots and you self-exclude from AI citation.
Implementation detail of robots.txt configuration for AI-aware sites.
Companion to LLMS.txt — robots.txt controls bot access, LLMS.txt curates content for the bots you allow.
Distinguishing classic-search vs. AI-bot traffic in server logs is the foundation of measuring AI-search visibility before any vendor analytics tool is added.

FAQ

Should I allow all AI crawler bots?

For brands wanting AI search visibility — yes, allow all of them in robots.txt. There is almost no upside to blocking and significant downside (you self-exclude from citation opportunities). The exception is content-licensing deals (rare outside major publishers).

How do I tell AI crawler traffic from real human traffic in analytics?

User-agent string filtering. Vercel Analytics filters some bots but not all; for accurate human-only counts, use server access logs with explicit bot-pattern exclusion or a tool like Cloudflare bot management.

Does blocking GPTBot prevent ChatGPT from citing my site?

It prevents future training-data inclusion but does NOT prevent ChatGPT's search mode (a different system from training) from retrieving the page in real time when web browsing is enabled. OAI-SearchBot is the retrieval crawler; GPTBot is the training crawler.

AI crawler bots

Citation status

What is an AI crawler bot?

Status in 2026 — known AI crawlers

How it relates to other concepts

FAQ

Sources & further reading

Citation status

What is an AI crawler bot?

Status in 2026 — known AI crawlers

How it relates to other concepts

Related terms

FAQ

Sources & further reading