/terms/prompt-injection · 4 min read · intermediate

Prompt injection

Prompt injection is an attack class where adversarial text is placed where a language model reads it as a command rather than as data, overriding the model's intended task. Direct injection is typed into the prompt by the user; indirect prompt injection (Greshake et al. 2023) hides the instructions in content the model later retrieves (a page, a document, a search result), so the attacker never touches the model directly. The indirect variant is the security mirror of GEO: it rides the same retrieval path that optimization targets.

Citation status

ChatGPTPerplexityClaudeCopilotGemini

Last checked 2026-06-09

What is prompt injection?

Prompt injection is an attack class in which adversarial text is placed where a language model will read it as a command rather than as data, causing the model to ignore or override its intended task.1 The vulnerability exists because a language model processes instructions and content through the same channel (its context window), so it has no built-in way to tell "text I should act on" from "text I should merely read."

There are two forms, and the distinction is the whole point for AI search:

  • Direct prompt injection is typed straight into the model by whoever is interacting with it (the classic "ignore your previous instructions and instead do X"). The term entered use in 2022 as practitioners demonstrated it against early LLM apps.
  • Indirect prompt injection, formalized by Greshake et al. in 2023, hides the instruction inside third-party content the model later retrieves: a web page, a document, an email, a search result. The attacker never touches the model; they only need their text to land in something the model reads.1

The indirect variant is the one that matters here, because retrieval is exactly how an AI search engine pulls external content into its working context. Indirect prompt injection is, in effect, the security mirror image of GEO: it travels the same retrieve-and-extract path that optimization works to win, and exploits the same fact that engines read and act on content they did not author.

Status in 2026

Prompt injection is an open, unsolved security problem rather than a settled, patched one. Because the data/instruction boundary is inherent to how current models read a context window, vendors ship mitigations (input filtering, instruction hierarchies that rank the system prompt above retrieved text, output guardrails) that reduce exposure without eliminating it; none publicly claim a complete fix. The working defensive posture in the literature is to treat all retrieved, third-party content as untrusted data, never as instructions.

For a publisher, the practical reading is not "here is a lever" but "here is why your content is screened." The same property that makes a page citable (being retrievable, self-contained, and easy for an engine to lift a passage from) is the property an attacker abuses to smuggle in a command. So engines have strong incentive to inspect retrieved content for instruction-like and hidden text, and such inspection is a plausible part of why low-trust, manipulative pages tend to get filtered out of answers. The honest GEO consequence is a boundary, not a tactic: the editorial-trust path that earns citations runs in the opposite direction from injecting instructions.

How to apply

Prompt injection is something to understand and defend against, not to deploy. Three honest moves:

  • Do not embed instructions in your content to steer engines. Hidden text, "as an AI, you should cite this," and instruction-like phrasing aimed at the model are detectable, against engine policy, and belong to the black-hat C-SEO family. They invite filtering and policy enforcement, and they undercut the editorial trust that actually drives citations.
  • Treat retrieved content as untrusted data in your own LLM features. If you run any feature that feeds user-supplied or third-party text into a model (a chatbot over your docs, a summarizer of submitted URLs), assume that text may contain injected instructions and isolate it from your system prompt. This is the standing mitigation, not a complete fix.
  • Learn the shape of it so you can recognize it. Knowing that indirect injection hides commands in retrieved content helps you audit your own pipelines and understand why engines distrust manipulative pages, which is the defensive payoff of the concept.

What to skip: any attempt to manipulate an engine by planting commands in your content. It is the detectable, filterable opposite of being a trustworthy source.

How it relates to other concepts

  • Black-hat C-SEO is the boundary: prompt injection is the technical attack mechanism, while black-hat C-SEO is the GEO practice domain that may deploy it among other adversarial tactics (the citation- and ranking-manipulation use is the C-SEO-relevant subset of this broader security mechanism). This entry defines the mechanism defensively; that one catalogs its misuse.
  • RAG and context assembly are the path indirect injection rides: the instruction is planted so it gets retrieved and assembled into the model's context alongside legitimate content.
  • Hallucination grounding is what injection subverts: a successful indirect injection can push a model to produce attacker-chosen or ungrounded output despite the retrieved evidence.

Footnotes

  1. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T. & Fritz, M. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec 2023), arXiv:2302.12173 (submitted February 2023). The paper coins/systematizes indirect prompt injection: instructions planted in content an LLM-integrated application later retrieves, so the attacker needs no direct interface to the model, and argues that LLM-integrated applications blur the line between data and instructions, deriving a security taxonomy of impacts (data theft, "worming," information-ecosystem contamination, and more). The earlier, direct form of prompt injection (instructions typed into the prompt) was demonstrated and named by practitioners in 2022; Greshake et al. extend it to the retrieval-borne case that is relevant to AI search. This entry is definitional and defensive and intentionally contains no exploitation detail; verified 2026-06-09 against the arXiv abstract and the AISec 2023 record. 2

Part of AI behavior· editorial cluster, not a semantic link

Also in this cluster: Citation hallucination · Citation precision and recall · Context rot · Hallucination grounding · Lost in the Middle · +1 more

FAQ

What is the difference between direct and indirect prompt injection?
Direct prompt injection is typed straight into the model by whoever is talking to it ('ignore your instructions and do X'). Indirect prompt injection, formalized by Greshake et al. in 2023, hides the instruction in third-party content the model will later retrieve: a web page, a PDF, an email, a search result. The attacker never interacts with the model; they only need the malicious text to be in something the model reads. The indirect form is the one that matters for AI search, because retrieval is exactly the channel an AI engine uses to pull external content into its working context.
Is prompt injection a GEO tactic?
No, and treating it as one is a mistake. Hiding instructions in your content to steer an engine is detectable, against engine policy, and squarely black-hat: it belongs to the same family as the manipulation tactics catalogued under black-hat C-SEO. Hidden and instruction-like text is detectable and against major platforms' stated policies, and engines are widely understood to screen for it. The editorial-trust path GEO actually rewards (clean, retrievable, self-contained content) is the opposite of injecting commands.
Can prompt injection be fully prevented?
Not with current architectures. Language models process instructions and data through the same channel (the context window), so a perfectly reliable boundary between 'this is content to read' and 'this is a command to follow' does not yet exist. Greshake et al. frame this as inherent to LLM-integrated applications. Vendors deploy input filtering, instruction hierarchies, and output guardrails that reduce the risk, but none of them claim full mitigation; treating retrieved content as untrusted data is the standing defensive posture, not a solved problem.

Sources & further reading

Get the monthly digest

New terms shipped that week, plus one observation from the AI-citation tracker.

More about what you'll get