The Multi-Cloud Reality
The Multi-Cloud Reality
Walk into the infrastructure review of any Fortune-500 company and you will almost certainly find services running on at least two cloud providers — often three. This is not an accident, a mistake, or a symptom of poor governance. It is the predictable outcome of how large organizations actually grow, acquire companies, negotiate contracts, and manage risk. Understanding the real drivers behind multi-cloud — and the hard limits of "cloud portability" — is the foundation every senior DevOps engineer needs before touching a single Terraform module or a cross-cloud VPN.
How Organizations Actually End Up Multi-Cloud
The narrative that engineering teams choose multi-cloud upfront is largely a myth. In practice, companies land in a multi-cloud posture through one of five paths:
- Mergers and acquisitions. Your company acquires a startup that ran entirely on GCP. The board expects the deal to close in six months. Migrating 200 services to AWS in that window is not feasible. The realistic path is a hybrid network connection (VPN or Interconnect) while you plan a longer-term consolidation — which often never fully happens because the acquired services ship features faster than migration tickets get prioritized.
- Vendor-specific best-of-breed services. BigQuery is the dominant analytics engine in its category. Azure Active Directory (Entra ID) is already the corporate identity provider because the company was Microsoft-first before it moved workloads to AWS. Snowflake runs on the cloud the data team chose before platform engineering existed. Each team picks the best tool for their job, and those tools span providers.
- Negotiation leverage. A $20M annual cloud spend gives you meaningful negotiating power — but only if the vendor believes you can move. Maintaining a real workload on a second provider, even a small one, is frequently justified internally as a hedge against price increases and lock-in. Finance teams understand this argument even when engineering teams resist the operational overhead.
- Regulatory and data-residency requirements. Some jurisdictions require data to remain in-country, and not every provider has a local region in every regulated market. A global SaaS company may serve EU customers from AWS eu-central-1, Japanese customers from GCP asia-northeast1, and Middle Eastern customers from Azure UAE North — not because the architecture is elegant, but because those are the only options that satisfy data-residency regulations in each market.
- Risk diversification after a major outage. The 2021 AWS us-east-1 outage, the 2022 GCP multi-region event, and the 2024 CrowdStrike incident collectively reminded enterprises that any single provider can fail in ways that take hours to resolve. Boards and CISOs increasingly require a documented failover capability on a second provider for tier-1 services, regardless of the engineering cost.
The Portability Myth
Every cloud provider — and every Kubernetes vendor — sells the idea that their platform is portable: move your workloads anywhere. The reality is more constrained. Portability exists at different layers, and each layer has a different cost.
- Container-layer portability (high, cheap): A Docker image built for
linux/amd64orlinux/arm64runs identically on EKS, GKE, or AKS. The compute layer is genuinely portable. This is the layer Kubernetes was designed to standardize. - Infrastructure-layer portability (medium, expensive): Terraform modules abstract provider APIs behind a consistent HCL interface, but a module that provisions an AWS ALB cannot provision an Azure Application Gateway by changing a variable. You need parallel modules, parallel state files, and parallel CI pipelines. The abstraction cost is real engineering time.
- Managed-service portability (low, very expensive): Aurora PostgreSQL is not Postgres. Cloud Spanner is not any open standard. BigQuery's SQL dialect, partition strategies, and slot-based pricing have no equivalent on AWS or Azure. The moment your application uses a managed service beyond basic RDS-compatible Postgres, you have accepted lock-in at the data layer — and data-layer lock-in is the hardest to reverse.
- Operational portability (lowest, hardest): Your teams know CloudWatch, not Azure Monitor. Your on-call runbooks reference
aws ec2 describe-instances, notaz vm list. Cognitive overhead is real. Multi-cloud doubles the tool surface your engineers must stay current on.
What Big-Tech Actually Standardizes
Rather than chasing full portability, high-performing engineering organizations standardize the things that genuinely matter across clouds:
- Identity and access: a single IdP (Okta, Entra ID, Google Workspace) federated to all three clouds via SAML/OIDC. Engineers log in once; role assumption is provider-specific but governed centrally.
- Secrets management: HashiCorp Vault (or its cloud-native equivalent) as the single source of truth for secrets, with cloud-provider auth backends. No secrets are stored in cloud-native secret managers in isolation — they are all managed through the Vault API.
- Observability: a single pane of glass — Datadog, Grafana Cloud, or a self-hosted Prometheus/Thanos stack — that ingests metrics, logs, and traces regardless of provider. CloudWatch metrics are exported; GCP Cloud Monitoring metrics are exported. Engineers see one dashboard for the whole fleet.
- Cost visibility: a FinOps platform (Apptio Cloudability, CloudHealth, or the open-source OpenCost) that normalizes spend across providers into a single report with consistent tagging taxonomy.
- Networking: a defined transit architecture — typically AWS Transit Gateway + GCP HA VPN or dedicated interconnects — with consistent IP address management (IPAM) to prevent CIDR overlap across providers. Overlapping CIDRs in a multi-cloud network are one of the most painful production problems to remediate.
Pragmatic Decision Framework: When to Go Multi-Cloud
Before adding a second cloud provider, run this checklist honestly. Every "no" increases the operational debt you are taking on:
- Do you have a dedicated platform engineering team that will own the cross-cloud tooling? (Solo DevOps engineers cannot sustain two cloud footprints at big-tech quality.)
- Is the use case genuinely better served by provider B, or are you using provider B because a team chose it before platform standards existed?
- Have you modeled the steady-state operational cost — not just the migration cost — including on-call burden, tooling licenses, and training?
- Do you have a documented plan for cross-cloud incident response? Who owns the bridge call when the inter-cloud VPN goes down at 3 AM?
- Is the regulatory or business driver for multi-cloud documented and signed off by a stakeholder, or is this an engineering preference dressed up as a strategy?
The Honest State of Multi-Cloud in 2025
After years of multi-cloud hype, the industry has settled into a pragmatic consensus: active-active multi-cloud for arbitrary workloads is not economically viable for most organizations. What works at scale is a tiered model:
- Tier 1 (primary cloud — 80-90% of spend): AWS or GCP as the primary platform. Deep native integrations, managed services, and provider-specific expertise. This is where your product runs.
- Tier 2 (secondary cloud — 10-20% of spend): a second provider for specific, justified workloads — BigQuery for analytics, Entra ID for identity, a secondary Kubernetes cluster for regulatory failover, or a managed database used by an acquired business unit.
- Tier 3 (standardization tooling — spans both): the cross-cutting concerns described above: Vault, a single IdP, unified observability, cost normalization. This tier is not a cloud provider; it is your internal platform layer.
The tutorials ahead will teach you Azure and GCP in depth — their compute models, networking primitives, managed Kubernetes offerings, and DevOps toolchains. As you learn each service, keep coming back to this lesson's question: why would an organization actually need this, and what is the true operational cost of adding it? That question separates engineers who deploy infrastructure from engineers who design it.