AWS Architecture AI Workloads Enterprise 2026

What the Reference Actually Looks Like

I have lost count of the enterprise teams that built their first production AI workload on AWS, ran into the same two or three problems, and rebuilt within nine months. The problems are not random. They cluster around specific layers of the architecture that vendor presentations gloss over because the gloss is more saleable than the detail.

A senior platform engineer at a regional bank described their team's first Bedrock-based deployment to me last year: "Everything that broke was something we did not know we needed until it broke." That is the common story. The reference architecture below covers the five layers where enterprise teams typically discover what they did not know they needed.

Real Estate Investment AI

Your models aren’t wrong. Your data is. Here’s how real estate teams fix AI failures before they cost millions.

Download

The layers are network and identity, data and storage, model access and inference, application and orchestration, and observability and governance. None is optional in production. Each one has specific AWS services that fit and ones that do not.

Layer One: Network and Identity

The first layer is where account structure, VPC design, and IAM policy live. The team that gets this layer wrong spends quarters fixing it later. The team that gets it right rarely thinks about it again.

Multi-account is the baseline. Production AI workloads in their own account or set of accounts. Non-production workloads in separate accounts. AWS Control Tower or Organizations manages the structure. The blast radius of any single mistake stays inside one account.

VPC design has to account for Bedrock VPC endpoints, SageMaker network isolation, and the egress patterns that AI workloads create. Public internet egress from inference workloads is rarely necessary and often a security finding. Private endpoints cost money and pay back in audit defensibility.

IAM scoping is where most security findings concentrate. Bedrock model invocations, S3 access for training data and vector stores, KMS key access for encryption all need least-privilege roles. The pattern that works is per-workload roles, not per-team roles. A team's general role should not be able to invoke production Bedrock models.

The bank engineer I mentioned earlier put their team's lesson directly: "We started with one IAM role for everything because it was faster. We rebuilt eight months later because audit found seventeen ways the role could be used badly."

Layer Two: Data and Storage

The second layer is data infrastructure. AI workloads consume data, produce embeddings, and generate outputs. Each has different storage requirements.

Source data lives in S3 with appropriate access policies, classification tags, and lifecycle rules. The classification tags matter for the governance layer downstream. Data without tags has no governance unless someone manually checks every file, which nobody does.

Vector storage has three credible options on AWS: OpenSearch with vector search, Aurora with pgvector, and Pinecone or similar deployed in your VPC. The choice depends on workload patterns more than on AWS preference. For small to medium scale, pgvector if you already have Postgres. For larger scale with hybrid search, OpenSearch. Pinecone or Weaviate for specific feature needs.

Output storage is the one most teams forget. AI-generated content, agent action logs, prompt-completion pairs for analysis. These accumulate. They need their own retention policies, their own access controls, and often their own classification. Production AI systems produce more storage than the input data once they have run for a while.

Layer Three: Model Access and Inference

The third layer is how the application reaches models. Three patterns dominate.

The first pattern is direct Bedrock invocation. Application code calls Bedrock through the AWS SDK. Simple, well-supported, fine for many use cases. The pattern works until you need cross-region failover, tier routing across models, or detailed cost attribution per request.

The second pattern is Bedrock with an internal gateway. A thin internal API sits between application code and Bedrock. The gateway handles cross-region failover, model selection, cost tracking, and rate limiting. The build cost is small and pays back the first time a model has an outage.

The third pattern is multi-provider through a gateway. Bedrock plus OpenAI plus Anthropic direct plus self-hosted on SageMaker, with routing logic that selects per request. Justified for workloads where vendor lock-in matters or where capability mix favors multiple providers. More operational complexity than most workloads need.

For self-hosted models on SageMaker, the workload has to clear specific cost thresholds before the math works. Below roughly $30K monthly inference, Bedrock or external APIs usually beat self-hosted. Above that, the math becomes workload-specific and sometimes favors self-hosting.

Layer Four: Application and Orchestration

The fourth layer is where business logic and agent orchestration live. The choice here affects everything from latency to debuggability.

For simple AI features, Lambda invoking Bedrock directly works well. The serverless model fits the request-response shape of most AI calls. Cold start latency matters for some workloads and not others.

For multi-step workflows, Step Functions provides explicit orchestration when the workflow has discrete stages. Code-based orchestration in containers fits when the workflow is dynamic or latency-critical. Bedrock Agents fits when the workflow matches its abstractions.

For real-time conversational AI, ECS or EKS with persistent connections fits better than Lambda. The stateful nature of conversations and the streaming response pattern push toward longer-running compute.

The trap at this layer is over-engineering. I have watched teams build complex agent orchestration for workflows that a single Lambda invoking Bedrock with structured output would have solved. The complexity is paid in operational overhead, debugging difficulty, and engineering time. Simpler architecture, where it works, usually wins.

Layer Five: Observability and Governance

The fifth layer is the one most teams under-invest in until something breaks. CloudWatch captures basic metrics and logs. X-Ray traces requests through the AWS services. Both are necessary and not sufficient for AI workloads.

AI-specific observability needs prompt and completion logging with appropriate redaction, per-request cost attribution, eval results from continuous evaluation, and drift detection on model behavior. Tools like Helicone, Langfuse, or AWS-native combinations of CloudWatch and Bedrock logging cover this layer.

Governance artifacts also live here. Model risk inventory for systems subject to model risk management frameworks. Audit trails for systems subject to EU AI Act high-risk requirements. Data lineage from source through embeddings to outputs. The artifacts have to be produced as a side effect of operation, not generated for audits.

Most enterprise teams I have worked with treat this layer as an afterthought. The teams that treat it as a first-class concern from day one spend less total engineering time than the teams that retrofit it.

What This Costs

Building this reference for a moderate enterprise AI workload typically takes a small platform team three to six months for initial buildout plus ongoing capacity for sustained operation. The ongoing cost is small relative to the cost of getting one layer wrong and rebuilding under production pressure.

The five-layer reference is not specific to any single use case. Customer support automation, document processing, conversational AI, agentic workflows all benefit from the same underlying architecture. The layer-specific design varies; the layered structure stays.

PropTech AI Infrastructure Roadmap

They’re stuck because the data layer they need doesn’t exist yet

Download

Call to Action

What Logiciel Does Here

Logiciel works with enterprise platform and engineering teams designing AWS AI architectures from initial deployment or remediating architectures that have hit one of the layer-specific problems. The work usually starts with a layer-by-layer assessment.

The Cloud Architecture for AI Workloads framework covers the cost-spike patterns that cross-cut the five layers. The Software Architecture in the AI Era framework covers the cloud-neutral version of inference-boundary design.

A 30-minute working session is enough to assess your current AWS architecture against the five layers.

Frequently Asked Questions

Should I start with Bedrock or with external model providers?

Depends on workload. Bedrock fits when AWS integration is a priority and the available models cover your needs. External providers fit when capability needs exceed Bedrock's catalog or when you have existing integrations. Many production architectures use both.

When does SageMaker for self-hosted models make sense?

Above roughly $30K monthly inference spend on a specific narrow workload, plus when you have the operational capacity to manage GPU infrastructure. Below that threshold, managed alternatives almost always win.

How do I handle multi-region for AI workloads?

Bedrock is region-specific with cross-region inference now available for some models. The architecture should not assume single-region availability for production. Failover patterns matter for any workload that cannot tolerate regional outages.

What is the right team size to operate the five layers?

Smaller than most enterprises plan. Two to four platform engineers with AI experience can operate the layers for moderate scale. The team grows with workload complexity and regulatory exposure, not with workload count.

How does this change for regulated workloads?

Layer 1 (network and identity) and layer 5 (governance) carry more weight. Layers 2-4 are similar but with additional controls around data classification, audit logging, and human oversight points. Sources: - AWS Well-Architected Framework, Machine Learning Lens, 2024 - AWS Bedrock Documentation, 2024

AWS Architecture for AI Workloads: A Reference for Enterprise Teams