Securing AI systems is the practice of protecting applications built on machine learning and large language models against the specific threats that come from how AI works, on top of the ordinary security every system needs. An AI system is still software running on infrastructure, so it inherits every traditional concern: access control, network security, secrets management, supply chain integrity. But AI adds new attack surfaces that traditional security was never designed to handle, because the model takes untrusted input, behaves probabilistically, draws on training data and external context, and can take actions on behalf of users. Securing AI systems means addressing both layers, the familiar and the new.
The new layer exists because an AI model is fundamentally different from conventional code. Conventional code does exactly what it was written to do, and security focuses on controlling who can run it and what it can reach. A model, by contrast, interprets natural language input and produces output based on patterns it learned, which means an attacker can influence its behavior simply by crafting the right input, with no need to find a bug or breach a perimeter. The instruction and the data arrive through the same channel, language, and the model cannot reliably tell the difference. This blurring is the root of most AI-specific threats.
The threats become more serious as AI systems gain capability and reach. A chatbot that only answers questions has a limited blast radius. An AI system that retrieves company documents, calls internal tools, queries databases, sends messages, or takes actions has a large one, because a successful attack can now exfiltrate data or trigger real operations. The agentic AI systems that became common through 2025 and 2026, which plan and execute multi-step tasks using tools, raise the stakes considerably, since a manipulated agent can do real damage with real credentials. Securing AI systems scales in importance with what those systems are allowed to do.
Securing AI systems is not a single product you buy or a checklist you complete once. It is a layered discipline that spans the model, the data it sees, the tools it can call, the infrastructure it runs on, and the humans who use and operate it. No single control is sufficient, because the attack surface is broad and the model's behavior is probabilistic rather than deterministic, which means defenses have to assume that some malicious input will get through and limit the damage when it does. The mindset is closer to securing a system that processes untrusted user input everywhere than to securing a fixed, predictable program.
This page covers what securing AI systems means, why AI introduces attack surfaces that traditional security does not cover, the main categories of threat, and the layered defenses that organizations use to keep AI systems safe in production. The specific attacks and tools will keep evolving quickly, as this is a fast-moving area. The underlying principle, that AI systems need both traditional security and a new layer of defenses aimed at the model, its data, and its actions, is durable and increasingly central to deploying AI responsibly.
The central new problem is that AI systems take untrusted input and interpret it as meaning, which conventional software does not do in the same way. A traditional application treats input as data to be validated and processed by fixed logic, so the classic defense is to validate and sanitize that input. An AI model treats input as language that can carry instructions, and because the model is built to follow instructions in language, an attacker can embed instructions in what looks like ordinary input and change the model's behavior. There is no clean boundary to validate, because the threat is in the meaning of the text, not its format.
This leads directly to the most discussed AI threat, prompt injection, where an attacker plants instructions that the model follows even though the developer never intended them. The instructions might come directly from a user trying to make the model misbehave, or indirectly from content the model reads, such as a web page, a document, or an email that contains hidden instructions. Because the model processes the developer's prompt and the attacker's text through the same channel, it can be tricked into ignoring its original instructions, revealing information it should not, or taking actions it should not. This is the AI equivalent of an injection attack, and it is unsolved in the general case.
Data is a second new attack surface, in several directions. The data a model is trained or fine-tuned on can be poisoned, so that an attacker influences the model's behavior by corrupting its training inputs. The data a model retrieves at runtime, in retrieval-augmented systems, can carry injected instructions. And the model can leak sensitive data in its output, either training data it memorized or context it was given, which becomes a serious problem when the model has access to confidential information. Each of these is a way that data, which traditional security treats mostly as something to protect at rest and in transit, becomes an active part of the attack surface in AI systems.
The third new surface is action. As AI systems move from answering questions to calling tools and taking actions, every capability you grant becomes something an attacker can try to misuse through the model. A model that can query a database, send an email, make a purchase, or modify a record can be manipulated into doing those things maliciously, with the system's own credentials and permissions. This is why agentic systems raise the stakes so much: the model is no longer just producing text that a human reviews, it is acting, and a manipulated model acting autonomously can cause real harm before anyone notices. Securing the actions is as important as securing the inputs.
Prompt injection and manipulation are the first category and the most distinctive. An attacker crafts input, directly or hidden in content the model reads, that overrides the developer's instructions and makes the model behave against its intended purpose. The consequences range from making the model say something embarrassing to extracting confidential context, bypassing safety controls, or triggering unauthorized actions in an agentic system. Indirect prompt injection, where the malicious instructions come from external content rather than the user, is especially dangerous because the user may be entirely innocent and unaware. There is no complete defense, which is why the layered approach focuses on limiting impact.
Data leakage and privacy violations are the second category. A model can reveal sensitive information in its output: data it memorized during training, confidential context it was given for a task, or another user's data in a multi-tenant system. This matters because AI systems are often given broad access to information to be useful, and that access becomes a liability if the model can be coaxed into disclosing it. Regulated data, personal information, trade secrets, and internal documents all become exposure risks when a model that can see them can also be manipulated into repeating them. Controlling what the model can access, and what it is allowed to say, is the defense.
Model and supply chain attacks are the third category. Models, like other software, come from somewhere, and using a model, library, or dataset from an untrusted source can introduce a backdoor or vulnerability. Training data can be poisoned to make a model behave maliciously under specific conditions that pass normal testing. Fine-tuning on attacker-controlled data carries similar risk. As the AI ecosystem grows, with many open models, libraries, and datasets shared widely, the provenance and integrity of these components become a real security concern, and the supply chain discipline that applies to software dependencies applies to AI components as well.
Abuse, denial of service, and cost attacks are the fourth category, and they are easy to overlook. AI systems are expensive to run, so an attacker who can drive up usage can drive up cost, sometimes deliberately, in what amounts to a financial denial of service. Attackers can also abuse a system's AI capabilities for their own ends, such as using a company's model to generate harmful content at the company's expense. And the resource intensity of AI means ordinary load can degrade or take down a system if it is not protected with rate limits and quotas. These threats are less exotic than prompt injection but just as real in production, and they need defending against.
Limiting access and permissions is the foundational defense, because it caps what any successful attack can do. The model should have the least access necessary, both to data and to tools, so that even if it is manipulated, it cannot reach what it does not need. An AI system that retrieves documents should see only the documents the current user is allowed to see, enforced outside the model rather than trusted to the model. An agentic system should be able to call only the tools its task requires, with the narrowest permissions on each. This least-privilege approach does not prevent manipulation, but it shrinks the blast radius when manipulation succeeds.
Input and output controls form a second layer. On the input side, systems filter and inspect what reaches the model, looking for known injection patterns and obviously malicious content, while accepting that this cannot catch everything because the threat is in meaning. On the output side, systems inspect what the model produces before it is acted on or shown, checking for leaked sensitive data, policy violations, and unsafe content, and blocking or modifying responses that fail. These guardrails are imperfect filters rather than guarantees, but in a layered defense they catch a meaningful fraction of attacks and reduce the chance that a manipulated model causes harm.
Keeping a human or a hard control in the loop for consequential actions is a third layer, and an important one for agentic systems. Rather than letting a model take irreversible or high-impact actions autonomously, the system requires approval, confirmation, or a deterministic check before those actions execute. A model can propose to send the email, issue the refund, or delete the record, but a guardrail outside the model decides whether it actually happens, often based on rules the model cannot override. This ensures that even a fully manipulated model cannot directly cause the worst outcomes, because the dangerous actions are gated by controls that do not depend on the model behaving correctly.
Monitoring, testing, and securing the surrounding system make up the fourth layer. AI systems need logging and monitoring of their inputs, outputs, and actions so that abuse and anomalies can be detected and investigated, the same way any production system is observed. They need adversarial testing, often called red-teaming, where people deliberately try to break the system's defenses before attackers do. And they need all the traditional security, since the AI sits on infrastructure with its own access controls, secrets, and dependencies that must be secured. The model-specific defenses sit on top of solid conventional security, and neither layer is sufficient alone, which is the core of the layered mindset.
The biggest difference is determinism. Traditional security largely concerns deterministic systems, where the same input produces the same output and a control either holds or it does not, so you can reason about whether a defense works. AI models are probabilistic, so the same input can produce different outputs, and a defense that blocks an attack today may not block a slight variation tomorrow. This means securing AI is less about building walls that definitely hold and more about reducing probability and limiting impact, which is an uncomfortable shift for security teams used to firmer guarantees. The defenses are statistical, not absolute.
The blurring of instructions and data is the second difference, and it has no clean traditional analogue. In conventional systems, code and data are separate, and a whole class of attacks comes from confusing the two, which we defend against with strict separation and validation. In AI systems, instructions and data arrive in the same natural language channel and the model cannot reliably tell them apart, so the strict separation that defends traditional systems is not available. This is why prompt injection is so hard: the very thing that makes the model useful, following instructions in language, is the thing the attacker exploits, and you cannot simply turn it off.
The expanding capability of AI systems changes the threat model over time, which is the third difference. Traditional applications have a relatively fixed set of capabilities, so their attack surface, while it can be large, is bounded and knowable. AI systems are increasingly given new tools, new data sources, and more autonomy, and every addition expands what a manipulated model can do. The threat model grows as the system grows, so securing AI is not a one-time effort but a continuous one that has to keep pace with the capabilities being added. A security review done before the system gained tool access is out of date once it has them.
The fast pace of the field is the fourth difference. AI capabilities, attacks, and defenses are all evolving quickly, faster than most areas of security, which means the specific threats and the best defenses against them change month to month. New attack techniques appear, new model behaviors create new risks, and new defensive tools emerge, so a team securing AI systems has to stay current in a way that more stable areas of security do not demand. This is not a reason to wait, since the systems are being deployed now, but it is a reason to treat securing AI as an evolving practice rather than a problem that gets solved and stays solved.
An internal knowledge assistant shows access control as the primary defense. A company builds an assistant that answers employee questions by retrieving from internal documents, and the obvious risk is that an employee, through clever prompting, gets the assistant to reveal documents they should not see. The defense is not to trust the model to enforce permissions but to enforce them in the retrieval layer, so the assistant can only ever retrieve documents the asking user is authorized for. Even a fully manipulated model cannot leak what it was never given, which is least privilege doing exactly what it is meant to do.
A customer-facing agent shows gated actions in practice. A support agent that can issue refunds and modify accounts is valuable but dangerous, because a customer might try to manipulate it into issuing a refund it should not. The defense is to gate the consequential actions: the model can propose a refund, but a deterministic rule outside the model checks eligibility and limits before any money moves, and large or unusual refunds require human approval. The model's manipulation, if it happens, stops at a proposal, because the action itself is controlled by logic that does not depend on the model behaving well.
A document-processing pipeline shows the indirect injection threat. A system that summarizes incoming documents or emails using an AI model can be attacked by an adversary who puts hidden instructions inside a document, so that when the model reads it, the instructions try to make the model exfiltrate data or misbehave. The user submitting the document may be entirely innocent. The defenses are to treat all retrieved and submitted content as untrusted, limit what the model can do with the results, inspect outputs before acting on them, and avoid giving the processing model access to sensitive tools, so that even a successful injection has nowhere to go.
These examples share a pattern even though their surfaces differ. The defense assumes the model can and will be manipulated, and focuses on limiting what manipulation can achieve: tight access so the model cannot reach what it should not, gated actions so it cannot do the worst things directly, and untrusted treatment of all input so injected content has limited reach. Seeing the pattern across knowledge, action, and document use makes clear that securing AI is less about preventing every manipulation, which is not currently possible, and more about engineering the system so that manipulation does limited damage. That is the practical core of the discipline.
It means protecting AI applications against the threats specific to how AI works, in addition to all the traditional security any software needs. An AI system still runs on infrastructure with access controls, secrets, and dependencies that must be secured the usual way. On top of that, it has new attack surfaces, because the model takes untrusted natural language input, behaves probabilistically, draws on training data and external context, and can take actions. Securing AI systems addresses both layers, the familiar conventional security and the new model-specific defenses aimed at the input, the data, and the actions.
Because a model is fundamentally different from conventional code. Conventional code does exactly what it was written to do, so security focuses on who can run it and what it can reach. A model interprets natural language and produces output based on learned patterns, so an attacker can change its behavior just by crafting input, without finding a bug or breaching a perimeter. Instructions and data arrive through the same channel, and the model cannot reliably tell them apart. This creates threats like prompt injection and data leakage that traditional security was never designed to handle, so AI needs an additional layer of defense.
Prompt injection is an attack where someone plants instructions that the model follows even though the developer never intended them. The instructions can come directly from a user trying to make the model misbehave, or indirectly from content the model reads, such as a web page, document, or email containing hidden instructions. Because the model processes the developer's prompt and the attacker's text through the same channel, it can be tricked into ignoring its original instructions, revealing information, or taking unauthorized actions. It is the AI equivalent of an injection attack, and it is unsolved in the general case, which is why defenses focus on limiting impact.
There are four broad categories. Prompt injection and manipulation, where crafted input overrides the developer's instructions. Data leakage and privacy violations, where the model reveals memorized training data or confidential context. Model and supply chain attacks, where a poisoned dataset or compromised model or library introduces malicious behavior. And abuse, denial of service, and cost attacks, where an attacker drives up expensive AI usage or misuses the system's capabilities for their own ends. A real defense addresses all four, because focusing only on the most discussed threat leaves the others open.
With layered defenses, because no single control is enough. Limit the model's access to data and tools so manipulation cannot reach much. Treat all input and retrieved content as untrusted, and inspect both inputs and outputs as imperfect filters. Gate consequential actions behind deterministic checks or human approval so a manipulated model cannot directly cause the worst outcomes. Monitor inputs, outputs, and actions to detect abuse, and red-team the system to find weaknesses before attackers do. All of this sits on top of solid traditional security for the surrounding infrastructure, and the layers overlap so that what one misses another may catch.
Not in the general case, at least not with current technology. The reason is that following instructions in language is exactly what makes the model useful, and an attacker exploits that same ability by embedding instructions in input. There is no clean boundary between legitimate instructions and malicious ones, because both arrive as natural language and the model cannot reliably tell them apart. Input filtering catches some attacks but not all, since the threat is in meaning rather than format. This is why mature defenses assume injection can succeed and focus on limiting what a manipulated model can reach and do, rather than promising to block every attempt.
Because they take actions, not just produce text. A chatbot that only answers questions has a limited blast radius, since a human reads the output before anything happens. An agentic system plans and executes multi-step tasks using tools, so it can query databases, send messages, make purchases, or modify records with the system's own credentials. A manipulated agent acting autonomously can cause real harm before anyone notices. This is why the defenses for agentic systems emphasize gating consequential actions behind deterministic checks or human approval, so even a fully manipulated agent cannot directly trigger the most damaging operations.
In several ways. AI models are probabilistic, so the same input can produce different outputs and defenses reduce probability rather than guarantee prevention, unlike the firmer guarantees of traditional controls. Instructions and data blur in AI because they share the natural language channel, removing the strict code-data separation that defends conventional systems. The threat model grows as the system gains tools, data, and autonomy, so securing AI is continuous rather than one-time. And the field moves fast, with attacks and defenses evolving month to month. These differences make securing AI a statistical, evolving discipline rather than a fixed checklist.
It is an ongoing practice, not a product. Tools exist that help with input and output filtering, monitoring, and testing, and they are useful, but no product secures an AI system on its own. Security spans the model, the data it sees, the tools it can call, the infrastructure it runs on, and the humans who use and operate it, and it has to keep pace as the system gains capabilities and as new attacks emerge. The right mindset is layered defense plus continuous attention, treating securing AI like securing any system that processes untrusted input everywhere, but with the added difficulty that the model is probabilistic.