Cloud Architecture & Landing Zones

Multi-Account Strategy

18 min Lesson 2 of 28

Multi-Account Strategy

Every large engineering organization running on a public cloud eventually hits a hard question: should all teams share one AWS account, or does each workload get its own? The answer from Netflix, Airbnb, Spotify, and every hyperscaler-trained SRE team is unambiguous: multiple accounts, deliberately structured. This lesson explains exactly why, how the account hierarchy is designed, and the practical mechanics of managing it at scale.

Why Single-Account Setups Break at Scale

A single AWS account appears simple — one place to look, one bill, one set of IAM policies. In practice it becomes a liability the moment you have more than two teams:

Blast radius is unlimited. A misconfigured S3 bucket policy in the data-science sandbox can expose production customer data. A runaway Lambda loop in a dev environment can exhaust the account-level Lambda concurrency quota, causing production invocations to be throttled.
Service quotas are shared. AWS limits (EC2 vCPU, EIP addresses, Lambda concurrent executions, RDS instances) are scoped to an account, not a team. One team's experiment starves everyone else.
IAM becomes a labyrinth. Hundreds of developers sharing one account means permission boundaries grow into thousands of bespoke policies. Least-privilege is impossible to audit.
Cost attribution is guesswork. Even with diligent tagging, reserved capacity, data-transfer costs, and support fees cannot be cleanly allocated to a business unit from a single account.
Compliance scope balloons. PCI-DSS and HIPAA require the cardholder or PHI environment to be isolated. If everything lives in one account, the auditor scopes the entire account — including all the unrelated workloads — which multiplies audit cost and risk.

The Core Principle: Account as Blast-Radius Boundary

An AWS account is a hard isolation boundary — not a soft one like a VPC or an IAM policy. Resources in one account cannot reach resources in another account unless you explicitly build cross-account trust (VPC peering, RAM shares, IAM role assumption). This property is what makes accounts the right unit of isolation. Design your account topology by asking: if this account is fully compromised or accidentally deleted, what is the maximum damage? Keep that answer small.

AWS Account ≠ AWS Organization: an account is a billing and permission boundary; an Organization is the management plane that groups accounts into OUs, applies SCPs, and aggregates billing. You can — and should — have both.

Reference Org Structure (Big-Tech Standard)

The following diagram shows the org hierarchy that mirrors what AWS Landing Zone Accelerator and most enterprise blueprints deploy. Each box is an AWS account or an OU (Organizational Unit).

Standard AWS multi-account org hierarchy: OUs enforce guardrails; each leaf account isolates one environment's blast radius.

The Four Foundational Account Types

Every enterprise org needs at least these account categories:

Management account — the payer account. It owns the AWS Organization, enables SCPs, and receives the consolidated bill. Run zero workloads here. If it is compromised, an attacker can disband the entire org.
Security accounts (Audit + Log Archive) — read-only. CloudTrail logs, GuardDuty findings, Config rules stream here from all member accounts. Only the security team has console access; even prod engineers cannot read or delete these logs.
Infrastructure / Shared Services accounts — the Transit Gateway hub, Route 53 private hosted zones, shared AMI pipelines, Artifactory/Nexus mirrors. Centralizing network infrastructure here means VPC changes don't require touching every application account.
Workload accounts — one (or more) per environment per product team: payments-prod, payments-staging, payments-dev. A bug in payments-dev cannot touch payments-prod — the account boundary enforces it.

Service Control Policies: The Guardrail Layer

SCPs are IAM-like JSON policies attached to OUs or individual accounts. They define the maximum permissions any principal in those accounts can have — they don't grant permissions, they restrict them. Even an account-level AdministratorAccess role cannot exceed what the SCP allows.

# SCP: deny disabling CloudTrail in production OU
# Attach to the Workloads-Prod OU in AWS Organizations
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyCloudTrailDisable",
      "Effect": "Deny",
      "Action": [
        "cloudtrail:DeleteTrail",
        "cloudtrail:StopLogging",
        "cloudtrail:UpdateTrail",
        "cloudtrail:PutEventSelectors"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyLeavingOrg",
      "Effect": "Deny",
      "Action": "organizations:LeaveOrganization",
      "Resource": "*"
    },
    {
      "Sid": "DenyRootUser",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:root"
        }
      }
    }
  ]
}

SCP gotcha — it denies the management account too: SCPs attached to the Root apply to all accounts except the management account. Attach production SCPs at the OU level, not the Root, so you retain the ability to make emergency changes from the management account if something goes wrong.

Vending Accounts with IaC

Manually clicking through the Console to create new accounts does not scale. At AWS, Google, and large SIs, account creation is a self-service pipeline: a developer opens a PR, a Terraform module provisions the account, applies the baseline SCPs, bootstraps the CDK/Terraform state bucket, and sends the ARN to the team — all within minutes.

# Terraform: create and configure a new workload account
# terraform/modules/workload-account/main.tf

resource "aws_organizations_account" "this" {
  name      = var.account_name          # e.g. "payments-prod"
  email     = var.account_email         # unique per account (AWS requirement)
  parent_id = var.parent_ou_id          # target OU: workloads-prod OU ID

  # Prevent accidental destroy — production accounts should never be
  # deleted via Terraform without a deliberate two-step override.
  lifecycle {
    prevent_destroy = true
  }
}

# Assume the OrganizationAccountAccessRole in the new account
# to lay down baseline resources
provider "aws" {
  alias  = "member"
  region = var.region
  assume_role {
    role_arn = "arn:aws:iam::${aws_organizations_account.this.id}:role/OrganizationAccountAccessRole"
  }
}

# Baseline: enable GuardDuty in the new account
resource "aws_guardduty_detector" "this" {
  provider = aws.member
  enable   = true
}

# Baseline: send Config snapshots to the centralized log-archive bucket
resource "aws_config_delivery_channel" "this" {
  provider       = aws.member
  name           = "central-config"
  s3_bucket_name = var.log_archive_bucket
  depends_on     = [aws_config_configuration_recorder.this]
}

Cross-Account IAM: The Trust Model

Once accounts exist, humans and CI/CD pipelines need to cross account boundaries. The pattern is always the same: a role in the target account has a trust policy allowing an identity in a source account (or an IAM Identity Center permission set) to assume it. Credentials never leave their origin; the caller receives short-lived tokens (max 1 hour, default 15 minutes) from STS.

A GitLab or GitHub Actions runner in the tools-prod account assumes deployer-role in payments-prod to deploy. The runner never holds long-lived payments-prod credentials. If the runner is compromised, blast radius is limited to what deployer-role can do — not the entire payments-prod account.

Billing Isolation and Chargeback

AWS Cost Explorer attributes costs to accounts natively — no tag schema required. Many finance teams create one account per business unit (even if the account contains multiple services) purely for clean chargeback. Combined with cost anomaly detection scoped per account and budgets with SNS alerts, you get granular spend visibility that a single-account tag-based approach can never reliably deliver.

Start with fewer accounts than you think you need. The overhead of managing 200 accounts with poor naming conventions, no IaC vending machine, and no SSO is worse than a well-structured 20-account org. Build the vending pipeline first; accounts are cheap to add later.

Common Failure Modes in the Wild

Management account creep. Teams start deploying workloads in the management account "just for now." Three years later it runs 40 Lambda functions, 12 RDS instances, and deleting it is impossible. Enforce: no workloads in the management account via SCP from day one.
Flat OU structure. Attaching SCPs directly to accounts instead of OUs means each new account needs manual SCP wiring. A misconfigured batch creates accounts with no guardrails. Use OUs; accounts inherit automatically.
Missing account baseline. A new account has no GuardDuty, no Config, no CloudTrail. It exists for six months before anyone notices an EC2 instance mining cryptocurrency. The account-vending pipeline must enable these on day zero.
Sandbox accounts with prod-database connectivity. A developer's sandbox has a peering connection to the prod VPC "to test something." Sandbox SCPs must explicitly deny creating VPC peering to prod CIDRs.

Multi-account strategy is not bureaucracy — it is the engineering discipline that makes it safe to let 500 engineers move fast without fear of stepping on each other. In the next lesson, we operationalize this with AWS Control Tower and Landing Zones, which automate the entire org bootstrap in hours.