Most agent projects fail on the architecture, not the model. An agent demo takes an afternoon. An agent you can put in front of customers, audit, and pay for at the end of the month takes architecture.
Teams obsess over which model to use, then ship a system with no guardrails, no traces, and no cost caps.
The common pattern: pick a model, wire it to your tools, demo it, then try to bolt on safety and observability after the first surprise bill.
The approach that works: name the seven layers first, choose one orchestration pattern, write the handoff contracts and guardrails up front, then trace and cap everything before real traffic.
Every production agent system has the same seven layers whether you named them or not: interface, orchestration, agents and tools.
The orchestration pattern is the first real decision and the one teams get wrong most.
Guardrails are not a feature you add at the end. They are the rules that decide whether this ships.
A map of every layer in a production agent system, top to bottom. Read it one way and it is the request path.
Supervisor, hierarchical, pipeline, and peer-to-peer, laid out side by side: how each works, when to use it, and what to watch out for.
Blank rows you complete for every agent: one goal, the tools it can call, the actions it is allowed, and the rule that makes it escalate.
The three places to put a guardrail, a fill-in table for where a human must approve, and a final checklist you run as a gate before you ship.
Document where AI recommends, where it can act, and who owns each decision, then make sure clinicians know it so they are not the 75% who cannot say.
A model is something you call. An agent is something you trust to act. The architecture is everything that turns the first into the second safely.
Heads of AI, engineering leads, and the teams shipping agents to production. It assumes you can build, and gives you the architecture decisions to make before you do. Plan about 90 minutes to work through it as a team.
Gartner points to three causes: escalating costs, unclear business value, and inadequate risk controls. Each one traces back to architecture decisions that were never made. Cost caps, observability, and guardrails are not add-ons. They decide whether the project survives.
Scale the thresholds, not the structure. A low-risk internal tool needs lighter guardrails and approvals than a customer-facing agent that moves money. But every agent still needs one goal, a tool list, traces, and a step cap. The cheapest insurance you can buy is a hard limit on steps, tokens, and tool calls per task.
It is a working document, not a read. The reference architecture and pattern table are there to give everyone the same map. The agent specs, handoff contracts, and approval tables are blank on purpose. The blank rows are the actual work.
Default to a supervisor pattern unless you can name the specific reason it falls short. It is the simplest to control and audit and covers most real work. Hierarchical, pipeline, and peer-to-peer each have a place, but complexity in orchestration multiplies cost and risk everywhere else.