A migration playbook for VPs of Infrastructure responsible for resilience and regulatory geography — what to keep single-region, what to make active-passive, and what to put on active-active without paying for the whole platform twice.
Single-region cloud was the default architecture for a generation of energy software platforms. It is no longer enough. Regulators ask where the data lives. Customers ask what happens when a region fails. The dashboard does not have an answer that holds up under scrutiny.
Most replatforming programs we audit start with too broad a scope. Teams try to move everything to active-active and run out of budget before they finish the workloads that actually need it.
Map every workload. Identify which need active-active multi-region (real-time customer-facing, regulatory-impactful), which need active-passive (operational analytics with longer RTO), and which are fine single-region (internal tooling, exploration).
Active-active requires multi-region data. The data strategy is the most consequential decision — replication topology, conflict resolution, regional shards for data residency, and the read/write split that survives a region event.
Migration runs incrementally. New workloads launch multi-region by default. Failover is tested on a schedule on real workloads, not theorized in a runbook.
Map every workload. Identify which need active-active multi-region (real-time customer-facing, regulatory-impactful), which need active-passive (operational analytics with longer RTO), and which are fine single-region (internal tooling, exploration).
Active-active requires multi-region data. The data strategy is the most consequential decision.
Migration runs incrementally. New workloads launch multi-region by default. Failover is tested before it is needed.
Move workloads in the order that retires the most risk per week. Implement regional sharding where data residency requires it. Establish the failover-drill cadence the platform will run forever.
If your platform is single-region and your resilience requirements are not, the answer is a multi-region replatform with bounded scope and tested failover.
Regional sharding with documented data flows. We have implemented this for state-specific data residency in two energy operators.
Active-active typically costs 1.6 to 2.2x of single-region for the same workload. Active-passive typically 1.2 to 1.4x. Single-region stays at 1.0x. Average across the platform varies by mix; most operators land at 1.3 to 1.5x.
Yes. Migration is incremental. The old footprint runs until each workload has cleared its multi-region cutover.
The framework applies to GCP and Azure. The specific patterns differ; the principles are the same.
A drill on a schedule, on real workloads, with measured RTO and RPO. Read-only failover before write failover. The first time you run it is not the day the region goes down.