Advanced Terraform & IaC Patterns

Terragrunt & DRY Pipelines

18 min Lesson 7 of 28

Terragrunt & DRY Pipelines

Terraform is a powerful IaC engine, but it has a structural blind spot: it offers no native mechanism for keeping the configuration that calls your modules DRY. As soon as you manage three environments (dev, staging, prod) across two regions you find yourself copy-pasting the same backend block, the same provider version, and the same module source into dozens of leaf main.tf files. Terragrunt is the thin orchestration wrapper that solves exactly this: it keeps root config in one place, wires up remote state automatically, expresses inter-stack dependencies declaratively, and lets you run run-all apply to converge an entire environment in topological order. At FAANG scale it is the difference between a 5-file infra monorepo and a 3,000-file copy-paste disaster.

What Terragrunt Actually Is

Terragrunt is a Go binary that wraps terraform. Every Terragrunt command (terragrunt plan, terragrunt apply, terragrunt run-all apply) generates a temporary directory, writes backend and provider configuration into it, then delegates to terraform. Your team does not write Terraform differently — they just stop writing boilerplate. Terragrunt reads terragrunt.hcl files that use HCL2 and a set of Terragrunt-specific blocks: remote_state, dependency, inputs, generate, and include.

Terragrunt is not a replacement for Terraform. It is a coordinator. Each unit of infrastructure is still a plain Terraform module. Terragrunt adds the layer above: where state lives, what order things run, and what inputs flow between stacks.

The Canonical Repo Layout

The standard Terragrunt repo separates the what (Terraform modules) from the where and how (Terragrunt live configs). A typical three-environment AWS platform layout:

infra-live/
├── terragrunt.hcl                  # root config — backend template, provider version
├── dev/
│   ├── account.hcl                 # dev account ID, region
│   ├── vpc/
│   │   └── terragrunt.hcl
│   ├── eks/
│   │   └── terragrunt.hcl
│   └── rds/
│       └── terragrunt.hcl
├── staging/
│   ├── account.hcl
│   ├── vpc/terragrunt.hcl
│   └── eks/terragrunt.hcl
└── prod/
    ├── account.hcl
    ├── vpc/terragrunt.hcl
    └── eks/terragrunt.hcl

The magic lives in the root terragrunt.hcl. Every leaf stack includes it, which means backend configuration and required providers are written exactly once.

Root terragrunt.hcl — the Single Source of Truth

# infra-live/terragrunt.hcl

locals {
  # Walk up the directory tree to find the environment-level account.hcl
  account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  env          = local.account_vars.locals.env          # "dev" | "staging" | "prod"
  aws_region   = local.account_vars.locals.aws_region
  account_id   = local.account_vars.locals.account_id
}

# Generate an AWS provider block automatically for every child stack
generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "${local.aws_region}"
  assume_role {
    role_arn = "arn:aws:iam::${local.account_id}:role/TerraformDeployRole"
  }
}
EOF
}

# One backend template — bucket + key built from directory path
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "myco-tfstate-${local.account_id}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = local.aws_region
    encrypt        = true
    dynamodb_table = "myco-tfstate-lock"
  }
}

The key function is path_relative_to_include(). For the stack at prod/eks/terragrunt.hcl it returns prod/eks, so the S3 key becomes prod/eks/terraform.tfstate — unique per stack, zero manual naming, zero risk of collision.

Use one S3 bucket per AWS account, not per environment. A single bucket with path-segmented keys is far easier to audit and lock down with bucket policies than a proliferation of per-environment buckets.

Leaf Stack terragrunt.hcl — Module Calls Without Boilerplate

# infra-live/prod/eks/terragrunt.hcl

include "root" {
  path = find_in_parent_folders()  # walks up until it hits infra-live/terragrunt.hcl
}

# Pull the VPC outputs from a sibling stack — Terragrunt wires this up automatically
dependency "vpc" {
  config_path = "../vpc"

  # Mock outputs for plan-only CI runs when the VPC does not yet exist
  mock_outputs = {
    vpc_id          = "vpc-00000000"
    private_subnets = ["subnet-00000001", "subnet-00000002"]
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}

terraform {
  source = "git::https://github.com/myco/infra-modules.git//eks?ref=v3.7.0"
}

inputs = {
  cluster_name    = "prod-eks"
  vpc_id          = dependency.vpc.outputs.vpc_id
  subnet_ids      = dependency.vpc.outputs.private_subnets
  cluster_version = "1.30"
  node_groups = {
    general = { instance_type = "m6i.2xlarge", desired_size = 6 }
  }
}

There is no backend.tf, no provider.tf, no versions.tf. Terragrunt generates all three at run time from the root config. The leaf file contains only what is unique to this stack: the module source, the version pin, and the inputs.

Dependency Wiring and the run-all Command

The dependency block is Terragrunt's most powerful feature. It reads the state of another stack (config_path) and exposes its outputs as a structured object. This replaces ad-hoc terraform_remote_state data sources and makes the dependency graph explicit and machine-readable.

With dependencies declared, you can converge an entire environment with a single command:

# Plan all stacks in prod in dependency order (parallel where safe)
terragrunt run-all plan --terragrunt-working-dir infra-live/prod

# Apply all stacks in prod — Terragrunt resolves the DAG automatically
terragrunt run-all apply --terragrunt-working-dir infra-live/prod

# Apply only the eks stack and everything it depends on
terragrunt run-all apply --terragrunt-working-dir infra-live/prod/eks

# Destroy in reverse-dependency order (safe teardown)
terragrunt run-all destroy --terragrunt-working-dir infra-live/prod \
  --terragrunt-ignore-dependency-errors

run-all builds a directed acyclic graph (DAG) from all dependency blocks it finds in the target directory tree. Independent stacks run in parallel; dependent ones wait. This typically cuts environment-wide apply time by 60–80% compared to sequential execution.

Terragrunt run-all resolves the dependency DAG — independent stacks (eks, rds, elasticache) apply in parallel after vpc completes.

DRY Config with account.hcl and environment.hcl

A production Terragrunt repo typically has two or three levels of shared config files that Terragrunt reads with read_terragrunt_config():

account.hcl — at the environment directory level. Holds the AWS account ID, region, and environment name for that subtree.
region.hcl — at a region-level directory if you multi-region. Holds the region string.
Root terragrunt.hcl — reads both, generates provider and backend for every child automatically.

This means adding a fourth environment (e.g., perf) requires creating one directory, one account.hcl with three values, and copying the leaf terragrunt.hcl files. No backend blocks, no provider blocks, no versions files to touch.

Never use run-all apply against production without a prior run-all plan reviewed in CI. The DAG execution is fast precisely because it is parallel — a bad change can race to completion across multiple stacks before you can interrupt it. Production applies should always be gated on a human-approved plan stored as a CI artifact.

Terragrunt in CI/CD Pipelines

The recommended pipeline pattern uses run-all plan on PR open and run-all apply on merge to main, scoped to the changed environment directory:

# .github/workflows/terraform.yml (simplified)
name: Terragrunt Plan / Apply

on:
  pull_request:
    paths: ["infra-live/**"]
  push:
    branches: [main]
    paths: ["infra-live/**"]

jobs:
  detect-env:
    runs-on: ubuntu-latest
    outputs:
      env_dir: ${{ steps.detect.outputs.env_dir }}
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 2 }
      - id: detect
        run: |
          CHANGED=$(git diff --name-only HEAD~1 HEAD -- infra-live/ | head -1)
          ENV_DIR=$(echo "$CHANGED" | cut -d/ -f1-2)   # e.g. infra-live/prod
          echo "env_dir=$ENV_DIR" >> "$GITHUB_OUTPUT"

  plan:
    needs: detect-env
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4
      - uses: gruntwork-io/terragrunt-action@v2
        with:
          tf_version: "1.9.5"
          tg_version: "0.67.0"
          tg_command: run-all plan
          tg_dir: ${{ needs.detect-env.outputs.env_dir }}

  apply:
    needs: detect-env
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: gruntwork-io/terragrunt-action@v2
        with:
          tf_version: "1.9.5"
          tg_version: "0.67.0"
          tg_command: run-all apply
          tg_dir: ${{ needs.detect-env.outputs.env_dir }}

Pin both tf_version and tg_version in CI. Terragrunt releases are frequent and occasionally introduce behavioural changes. Uncontrolled version drift across developers and CI is a notorious source of "works on my machine" plan divergence. Use a .terraform-version and .terragrunt-version file in the repo root and read them in your pipeline.

Common Production Failure Modes

Stale mock outputs in plan. If a dependency's mock outputs do not match the real outputs, plans look clean but applies fail. Audit mocks whenever the upstream module changes its output shape.
Lock contention on run-all apply. Parallel stacks that share a DynamoDB lock table can hit throttling under heavy parallelism. Set --terragrunt-parallelism 4 to bound concurrency in environments with many stacks.
State path collision after directory rename. If you rename dev/rds/ to dev/aurora/, Terragrunt generates a new S3 key. The old state is orphaned; the new path starts empty and tries to create everything again. Always terraform state mv or physically rename the S3 key before the directory rename lands.
Generated files committed to git. Terragrunt writes provider.tf and backend.tf into working directories. Add .terragrunt-cache/ and any generated*.tf files to .gitignore — committing them breaks the DRY model.

Mastering Terragrunt turns a brittle collection of environment-specific Terraform directories into a coherent, auditable, and fast infrastructure platform. The investment in the root config pays back on day one the second engineer joins.