Most teams reach for the most complex option first. That’s usually the expensive mistake.
A generalist builds to the thinnest layer that works everywhere, so you miss the managed services that would have saved you months and money. An AWS-native team builds with the platform, not around it , Bedrock for foundation models, the Well-Architected Framework for security, cost, reliability and performance, and HIPAA-eligible architectures using the services AWS supports under a BAA.
Four tools, four jobs. They’re often combined , but they’re not interchangeable.
The underrated baseline. A good prompt with the right context handles more than people expect. Start here , if it works, you’ve saved yourself months.
Gives the model your knowledge at answer time. Retrieve the relevant documents and hand them over so it answers from your truth, not its training. Right when the model needs facts specific to your business, especially ones that change.
Changes the model’s behavior by training on your examples. For teaching a style, a format, or a narrow task it keeps getting wrong — not for teaching facts. Fine-tuning to inject knowledge is a common, costly misunderstanding.
Let the model take actions across multiple steps: call tools, query systems, make decisions, chain it together. Powerful, and the most complex and least predictable. Right when the task genuinely needs multi-step autonomy.
The same build, two ways , what the generic path quietly costs you.
| If your problem is… | Start with | Why |
|---|---|---|
| The model needs your specific, changing knowledge | RAG | Inject facts at answer time; keep them current without retraining |
| The model’s tone, format, or a narrow task is off | Fine-tuning | Change behavior, not knowledge |
| The model just needs better instructions | Prompting | Cheapest, fastest; often enough on its own |
| The task needs multi-step actions and tool use | Agents | Autonomy across steps and systems |
| Knowledge + consistent format | RAG + light fine-tuning | Combine: facts from retrieval, behavior from tuning |
The expensive errors we see most often. Avoiding them is half the battle.
Many “agent” problems are really a deterministic workflow with one model call in the middle — far more reliable and cheaper to run.
It bakes in a snapshot that’s stale the moment your pricing or policies change. Use RAG instead.
Teams build RAG or fine-tuning before testing whether a strong prompt already solves it, and without evals, you’re guessing whether a change helped.
Our take: the most common production pattern we ship is RAG for the knowledge, a little fine-tuning where format consistency matters, and agents reserved for the genuinely multi-step work.
We start from your problem, not the technique — looking at your use case, data, and constraints, then recommending the simplest architecture that hits your accuracy and reliability bar. We prove it with evals before adding complexity, so you can see whether each change actually helped rather than guessing.
Usually you should. RAG plus light fine-tuning is a common, strong combination. Agents often sit on top of RAG so the model can both retrieve and act.
Yes, for behavior: a consistent format, a specific tone, or a narrow task the base model keeps getting wrong. Not for facts.
Start from the problem, not the technique. The architecture review looks at your use case, data, and constraints, and recommends the simplest architecture that hits your accuracy and reliability bar.
Bring your use case. We’ll tell you which architecture fits, where you can keep it simple, and where the complexity actually earns its place.