What is AI access control?

AI access control is the umbrella term for the signals a website uses to govern how AI systems discover, fetch, identify against, and use its content. There is no single switch; instead there are several distinct mechanisms answering distinct questions: robots.txt controls whether a crawler may fetch a URL, llms.txt offers a curated AI-readable map of the site, AIPREF (the Content-Usage signal) declares what the content may be used for (such as training versus search), and Web Bot Auth cryptographically verifies which bot is making a request. Most are voluntary, and they are easily conflated.

How do I stop AI from training on my content?

There is no fully reliable technical switch, and this is exactly where the signals get conflated. Blocking a crawler in robots.txt may stop fetching by a compliant bot but does not by itself declare a training-use preference; declaring train-ai=n through AIPREF states a preference about training use but does not enforce it; and neither verifies who is actually requesting (that is Web Bot Auth). All of these depend on voluntary compliance by the AI operator. The honest answer is that you can express preferences and block compliant crawlers, but technical access control plus contracts and licensing, not a single tag, are what create an actual barrier. If content must not be accessed or used at all, the reliable path is not to publish it publicly: put it behind authentication, paywalls, or contractual licensing with technical controls.

Is robots.txt enough to control AI access?

No, for two reasons. First, robots.txt only governs fetch access (may a crawler retrieve the URL); it says nothing about what an AI system may do with content it has already obtained, which is the usage-preference question AIPREF addresses, nor about verifying the requester's identity, which Web Bot Auth addresses. Second, robots.txt is a voluntary standard: well-behaved bots honor it, but it is a request, not an enforcement mechanism. Treating robots.txt as a complete AI-control solution conflates fetch access with usage preference and identity, which are separate problems.

/terms/ai-access-control · 5 min read · intermediate

AI access control

Cluster pillar

AI access control is the umbrella for the set of signals a website uses to govern how AI systems discover, fetch, identify against, and use its content. It spans distinct questions answered by distinct mechanisms: robots.txt (may you fetch this), llms.txt (a curated map), AIPREF / Content-Usage (what you may use it for), and Web Bot Auth (who is asking). Most are voluntary, emerging, and easily conflated; the value is keeping them separate, because blocking crawling, opting out of training, and verifying a bot are different actions with different signals.

Citation status

ChatGPT0×Perplexity0×Claude0×Copilot0×Gemini0×

Last checked 2026-07-06

AI access control is the umbrella for the set of signals a website uses to govern how AI systems discover, fetch, identify against, and use its content. There is no single switch. Instead there is a small family of mechanisms, each answering a different question, and the recurring mistake is to collapse them into one imagined "AI opt-out" when they are genuinely separate actions.

"Access control" is used here in a broad publisher-policy sense: most of the mechanisms below are signals or preferences, not hard technical access controls (authentication, paywalls, WAF rules). The surface also spans both restriction (access, usage, identity) and discovery (notifying engines of new content), so it is better read as a control-and-discovery map than as a set of blocks.

The control surface breaks into distinct questions:

Question	Signal	What it does	Honest status
May a bot fetch this URL?	robots.txt (RFC 9309)	Legacy crawl-access directives	Standardized, but voluntary: compliant bots honor it, others may not
What is worth reading here?	llms.txt	A curated, AI-readable map of the site	Community proposal; no major engine has confirmed using it
What may the content be used for?	AIPREF / Content-Usage	Usage-preference vocabulary (`train-ai`, `search`; `y`/`n`)	IETF working group, pre-standardization; voluntary
Who is the requester?	Web Bot Auth	Cryptographic verification of a bot's declared identity	Emerging IETF-track; verifies identity, does not itself block
How do I notify engines of changed URLs?	IndexNow	Instant URL push to participating engines	Live and adopted, but a discovery push, not a control

The agents all of these are aimed at are AI crawler bots (GPTBot, PerplexityBot, ClaudeBot and others, spanning training crawlers, search and retrieval crawlers, and user-triggered agents).

Two rows sit in the map for context rather than because they restrict anything: llms.txt only guides what AI is pointed to read, and IndexNow only accelerates discovery. Neither controls access or usage; they are included because publishers weigh them alongside the genuine access, usage, and identity signals.

Status in 2026

The defining feature of AI access control in 2026 is that it is fragmented and mostly voluntary. Each question above has its own signal at its own maturity: robots.txt is a settled standard, IndexNow is live, Web Bot Auth and AIPREF are emerging IETF-track work, and llms.txt is a community proposal with no confirmed engine support. There is no unified "AI policy" file that answers all four questions at once, and the efforts that exist deliberately stay in their own lanes (the AIPREF charter, for instance, puts crawler authentication explicitly out of scope, leaving that to Web Bot Auth).

The practical consequence is that "controlling AI access" is not one decision but several, and most of the levers are requests rather than guarantees. robots.txt, llms.txt, and AIPREF all rely on the AI operator choosing to comply. Web Bot Auth changes the picture only in that it lets a site verify identity (so it can decide whom to serve), but verification is not itself a block. The one thing none of these provide is enforcement against an operator who ignores them; that requires technical access control (authentication, rate limits, WAF rules) and, ultimately, contracts and licensing.

How to apply

The useful move is to pick the signal that matches the action you actually want, and to stop expecting one of them to do another's job:

To limit fetching by compliant crawlers: use robots.txt. Understand it is honored by well-behaved bots and ignored by others, and that it governs fetching only, not downstream use.
To declare a usage preference (e.g. allow search, disallow training): use AIPREF (train-ai=n, search=y). Treat it as a machine-readable statement of intent whose effect depends on voluntary adoption, and note the attachment syntax is still pre-standardization.
To guide what AI reads, not whether it may: use llms.txt as curation, while remembering no major engine has confirmed consuming it.
To know who is actually requesting: adopt Web Bot Auth as it matures, so you can distinguish a verified operator from a spoofed user agent before deciding how to respond.
To get new content noticed faster: use IndexNow. This is the discovery side of the surface, the opposite of restriction.

Worked example, to make the disambiguation concrete: to stay citable in AI search while opting out of training, you might set AIPREF train-ai=n, search=y, keep robots.txt allowing search crawlers, and adopt Web Bot Auth so a verified search crawler can be told apart from a spoofed trainer. Four signals, one coherent policy, and not a single "opt-out" switch.

What to skip:

Treating any one signal as a complete "AI opt-out." Blocking crawling, opting out of training, guiding reading, and verifying identity are four different actions; no single file does all four.
Assuming a preference signal is enforcement. robots.txt, llms.txt, and AIPREF are requests; an operator can ignore them. Reserve "control" language for technical access control and contracts.
Waiting for one unified standard before acting. The space is fragmented by design; use the mature signal for each goal now rather than expecting convergence soon.

How it relates to other concepts

AIPREF is the usage-preference layer: it answers "what may the content be used for," and its charter deliberately excludes the identity question, which is why it and Web Bot Auth are complementary rather than overlapping.
Web Bot Auth is the identity layer: cryptographic verification of which bot is asking, the prerequisite for any access decision that depends on trusting the requester.
llms.txt is the guidance layer: it shapes what AI reads rather than whether it may, and sits alongside (not inside) the access and preference signals.
IndexNow is the discovery side of the same surface: where the others restrict or qualify access, IndexNow accelerates it, which is why it belongs in the same map even though it points the opposite direction.
AI crawler bots are the agents these signals target: the family of signals only makes sense relative to the user agents (GPTBot, PerplexityBot, ClaudeBot, and others) they are meant to govern.
Upstream of generative engine optimization strategy: deciding which signals to set (stay visible in AI search while declaring a training preference, for instance) is a GEO-policy decision that this map is meant to make legible.

Part of Infrastructure· editorial cluster, not a semantic link

Also in this cluster: AI crawler blocking · AI crawler bots · AIPREF (AI usage preferences) · IndexNow Protocol · LLMS.txt · +2 more

Mentioned in· auto-generated from other terms' related lists

FAQ

What is AI access control?: AI access control is the umbrella term for the signals a website uses to govern how AI systems discover, fetch, identify against, and use its content. There is no single switch; instead there are several distinct mechanisms answering distinct questions: robots.txt controls whether a crawler may fetch a URL, llms.txt offers a curated AI-readable map of the site, AIPREF (the Content-Usage signal) declares what the content may be used for (such as training versus search), and Web Bot Auth cryptographically verifies which bot is making a request. Most are voluntary, and they are easily conflated.
How do I stop AI from training on my content?: There is no fully reliable technical switch, and this is exactly where the signals get conflated. Blocking a crawler in robots.txt may stop fetching by a compliant bot but does not by itself declare a training-use preference; declaring train-ai=n through AIPREF states a preference about training use but does not enforce it; and neither verifies who is actually requesting (that is Web Bot Auth). All of these depend on voluntary compliance by the AI operator. The honest answer is that you can express preferences and block compliant crawlers, but technical access control plus contracts and licensing, not a single tag, are what create an actual barrier. If content must not be accessed or used at all, the reliable path is not to publish it publicly: put it behind authentication, paywalls, or contractual licensing with technical controls.
Is robots.txt enough to control AI access?: No, for two reasons. First, robots.txt only governs fetch access (may a crawler retrieve the URL); it says nothing about what an AI system may do with content it has already obtained, which is the usage-preference question AIPREF addresses, nor about verifying the requester's identity, which Web Bot Auth addresses. Second, robots.txt is a voluntary standard: well-behaved bots honor it, but it is a request, not an enforcement mechanism. Treating robots.txt as a complete AI-control solution conflates fetch access with usage preference and identity, which are separate problems.

Sources & further reading

New terms shipped that week, plus one observation from the AI-citation tracker.

More about what you'll get

Last fact-checked 2026-06-02. Spotted an error or stale claim? See editorial methodology.

Changelog (2 entries)

2026-06-02: Initial publish: AI access control is the umbrella for the signals a site uses to govern how AI systems access and use its content. Joins the infrastructure cluster as the hub mapping four distinct, commonly-conflated questions to four mechanisms: robots.txt (fetch access), llms.txt (guidance), AIPREF / Content-Usage (usage preference), and Web Bot Auth (identity), plus IndexNow on discovery. Honest framing: most are voluntary, emerging signals, not enforcement; blocking crawling, opting out of training, and verifying a bot are different actions, and the hub's value is keeping them separate rather than collapsing them into one 'AI opt-out.'
2026-06-02: Peer-review pass (both reviewers found the hub sound). Clarified that 'access control' is meant in a broad publisher-policy sense (mostly signals and preferences, not hard technical controls) and that the surface spans control and discovery; relabeled IndexNow as a discovery push and noted llms.txt and IndexNow are in the map for context, not because they restrict access; named the crawler types (training, retrieval, user-triggered); added that content which must not be used should not be published publicly; added a worked four-signal example; and added primary-source anchors for Web Bot Auth, IndexNow, and llms.txt.