Terraform Fundamentals

Project: Provision a Web Stack with Terraform

35 min Lesson 10 of 30

Project: Provision a Web Stack with Terraform

This capstone lesson turns every concept from the tutorial — HCL syntax, providers, variables, state, remote backends, data sources, meta-arguments, and modules — into a single, end-to-end production-grade project. You will provision a three-tier web stack on AWS: a custom VPC with public and private subnets across multiple availability zones, an Auto Scaling Group of EC2 instances behind an Application Load Balancer, and remote state stored in S3 with DynamoDB locking. This is the pattern used by platform-engineering teams at companies like Stripe, Shopify, and Airbnb for their foundational cloud workloads.

Project Directory Structure

Organize the project as a root module that calls two reusable child modules: modules/network for VPC and subnets, and modules/compute for the load balancer, Auto Scaling Group, and security groups. Remote state is bootstrapped separately — you never let Terraform manage the S3 bucket and DynamoDB table that hold its own state file.

web-stack/
├── backend-bootstrap/      # One-time: creates the S3 bucket + DynamoDB table
│   └── main.tf
├── modules/
│   ├── network/
│   │   ├── main.tf         # VPC, subnets, IGW, NAT GW, route tables
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── compute/
│       ├── main.tf         # ALB, ASG, launch template, security groups
│       ├── variables.tf
│       └── outputs.tf
├── main.tf                 # Root: calls network + compute modules
├── variables.tf
├── outputs.tf
├── locals.tf
├── versions.tf             # Required providers + version constraints
├── backend.tf              # S3 remote backend config
└── terraform.tfvars        # Non-sensitive variable values

Bootstrap vs. managed state: The backend-bootstrap/ directory is a tiny, separate Terraform workspace that uses local state and is run exactly once per environment. It creates the S3 bucket (with versioning and server-side encryption) and the DynamoDB table. Because those resources hold your main stack's state, they must never be managed by the main stack — a destroy would wipe your state file and leave you with an unrecoverable blast radius.

Step 1 — Bootstrap Remote State

Before the main stack can use a remote backend, the backend infrastructure must exist. This is a one-time operation per environment. In CI pipelines at large organizations this step is gated behind a separate "bootstrap" pipeline that requires SRE approval to run.

# backend-bootstrap/main.tf

terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "tf_state" {
  bucket = "acme-terraform-state-prod"

  lifecycle {
    prevent_destroy = true  # Guard against accidental wipe
  }

  tags = { ManagedBy = "terraform-bootstrap", Purpose = "terraform-state" }
}

resource "aws_s3_bucket_versioning" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "tf_state" {
  bucket                  = aws_s3_bucket.tf_state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "tf_lock" {
  name         = "acme-terraform-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = { ManagedBy = "terraform-bootstrap", Purpose = "terraform-lock" }
}

# Run once:
# cd backend-bootstrap && terraform init && terraform apply

Step 2 — Root Module: Versions, Backend, and Locals

The versions.tf file pins every provider to a minor version range using the pessimistic constraint operator (~>). Unpinned providers are one of the most common sources of surprise infrastructure drift in shared repositories — a terraform init -upgrade on a colleague's machine can pull a provider with a breaking change and corrupt real infrastructure if the plan is auto-applied.

# versions.tf
terraform {
  required_version = "~> 1.8"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.50"
    }
  }
}

# backend.tf
terraform {
  backend "s3" {
    bucket         = "acme-terraform-state-prod"
    key            = "web-stack/production/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "acme-terraform-lock"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/mrk-abc123"
  }
}

# locals.tf
locals {
  name_prefix = "${var.environment}-${var.project_name}"

  common_tags = merge(
    {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
      Owner       = "platform-team"
    },
    var.extra_tags
  )

  # Compute AZ count: production gets 3, others get 2
  az_count = var.environment == "production" ? 3 : 2
}

Step 3 — The Network Module (VPC + Subnets)

The network module builds a hub-and-spoke VPC: public subnets host the ALB and NAT Gateways, private subnets host the EC2 instances. Each AZ gets one public and one private subnet. Using for_each over a slice of the AZ list makes the module AZ-count-agnostic — it works for a dev stack with two AZs and a production stack with three, driven entirely by a variable.

# modules/network/main.tf

data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  azs            = slice(data.aws_availability_zones.available.names, 0, var.az_count)
  public_cidrs   = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 8, i)]
  private_cidrs  = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 8, i + 10)]
}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags                 = merge(var.tags, { Name = "${var.name_prefix}-vpc" })
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = merge(var.tags, { Name = "${var.name_prefix}-igw" })
}

resource "aws_subnet" "public" {
  for_each = { for i, az in local.azs : az => { cidr = local.public_cidrs[i], idx = i } }

  vpc_id                  = aws_vpc.main.id
  cidr_block              = each.value.cidr
  availability_zone       = each.key
  map_public_ip_on_launch = true
  tags = merge(var.tags, { Name = "${var.name_prefix}-public-${each.value.idx + 1}", Tier = "public" })
}

resource "aws_subnet" "private" {
  for_each = { for i, az in local.azs : az => { cidr = local.private_cidrs[i], idx = i } }

  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr
  availability_zone = each.key
  tags = merge(var.tags, { Name = "${var.name_prefix}-private-${each.value.idx + 1}", Tier = "private" })
}

resource "aws_eip" "nat" {
  for_each = aws_subnet.public
  domain   = "vpc"
  tags     = merge(var.tags, { Name = "${var.name_prefix}-nat-eip-${each.value.availability_zone}" })
}

resource "aws_nat_gateway" "main" {
  for_each      = aws_subnet.public
  allocation_id = aws_eip.nat[each.key].id
  subnet_id     = each.value.id
  tags          = merge(var.tags, { Name = "${var.name_prefix}-nat-${each.value.availability_zone}" })
  depends_on    = [aws_internet_gateway.main]
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  tags = merge(var.tags, { Name = "${var.name_prefix}-rt-public" })
}

resource "aws_route_table_association" "public" {
  for_each       = aws_subnet.public
  subnet_id      = each.value.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table" "private" {
  for_each = aws_subnet.private
  vpc_id   = aws_vpc.main.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[each.key].id
  }
  tags = merge(var.tags, { Name = "${var.name_prefix}-rt-private-${each.value.availability_zone}" })
}

resource "aws_route_table_association" "private" {
  for_each       = aws_subnet.private
  subnet_id      = each.value.id
  route_table_id = aws_route_table.private[each.key].id
}

Three-AZ VPC layout: ALB nodes and NAT Gateways live in public subnets; EC2 instances (ASG) live in private subnets and reach the internet only via NAT.

Step 4 — The Compute Module (ALB + ASG)

The compute module wires together the Application Load Balancer in the public subnets, a launch template referencing the latest Amazon Linux 2023 AMI via a data source, and an Auto Scaling Group that spans the private subnets. The security group model is explicit and minimal: the ALB accepts port 443 from the internet, and EC2 instances accept port 443 only from the ALB's security group — never from 0.0.0.0/0.

# modules/compute/main.tf (abridged — key resources shown)

data "aws_ami" "al2023" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

# --- Security Groups ---

resource "aws_security_group" "alb" {
  name        = "${var.name_prefix}-alb-sg"
  description = "Allow HTTPS inbound from internet, all outbound to instances."
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS from internet"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(var.tags, { Name = "${var.name_prefix}-alb-sg" })
}

resource "aws_security_group" "ec2" {
  name        = "${var.name_prefix}-ec2-sg"
  description = "Allow HTTPS only from the ALB security group."
  vpc_id      = var.vpc_id

  ingress {
    from_port                = 443
    to_port                  = 443
    protocol                 = "tcp"
    security_groups          = [aws_security_group.alb.id]
    description              = "HTTPS from ALB only"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow outbound (NAT GW egress)"
  }

  tags = merge(var.tags, { Name = "${var.name_prefix}-ec2-sg" })
}

# --- ALB ---

resource "aws_lb" "main" {
  name               = "${var.name_prefix}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = values(var.public_subnet_ids)
  idle_timeout       = 60
  drop_invalid_header_fields = true  # Production security requirement

  access_logs {
    bucket  = var.access_log_bucket
    prefix  = "${var.name_prefix}-alb"
    enabled = true
  }

  tags = merge(var.tags, { Name = "${var.name_prefix}-alb" })
}

resource "aws_lb_target_group" "app" {
  name        = "${var.name_prefix}-tg"
  port        = 443
  protocol    = "HTTPS"
  vpc_id      = var.vpc_id
  target_type = "instance"

  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    matcher             = "200"
  }

  tags = merge(var.tags, { Name = "${var.name_prefix}-tg" })
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.acm_certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

# --- Launch Template + ASG ---

resource "aws_launch_template" "app" {
  name_prefix   = "${var.name_prefix}-lt-"
  image_id      = data.aws_ami.al2023.id
  instance_type = var.instance_type

  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.ec2.id]
  }

  iam_instance_profile { name = var.instance_profile_name }

  user_data = base64encode(<<-EOF
    #!/bin/bash
    dnf update -y
    dnf install -y nginx
    systemctl enable --now nginx
  EOF
  )

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"   # IMDSv2 only — security baseline
    http_put_response_hop_limit = 1
  }

  lifecycle { create_before_destroy = true }

  tags = merge(var.tags, { Name = "${var.name_prefix}-lt" })
}

resource "aws_autoscaling_group" "app" {
  name                = "${var.name_prefix}-asg"
  min_size            = var.asg_min
  max_size            = var.asg_max
  desired_capacity    = var.asg_desired
  vpc_zone_identifier = values(var.private_subnet_ids)
  health_check_type   = "ELB"
  health_check_grace_period = 120
  target_group_arns   = [aws_lb_target_group.app.arn]

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 80
      instance_warmup        = 60
    }
  }

  dynamic "tag" {
    for_each = merge(var.tags, { Name = "${var.name_prefix}-app" })
    content {
      key                 = tag.key
      value               = tag.value
      propagate_at_launch = true
    }
  }

  lifecycle { create_before_destroy = true }
}

Production pitfall — IMDSv1 on EC2: Omitting http_tokens = "required" in the launch template leaves IMDSv1 enabled. IMDSv1 is reachable from any process on the instance, including server-side request forgery (SSRF) vulnerabilities in application code. The Capital One breach (2019) exploited IMDS to extract IAM credentials. Always enforce IMDSv2 in every launch template. AWS now defaults new accounts to IMDSv2-only, but existing accounts and AMIs may still default to IMDSv1.

Step 5 — Root Module and Deployment Workflow

The root module wires the two child modules together, passing the network outputs into the compute module's inputs. It also emits the critical outputs consumed by subsequent CI pipeline steps — the smoke test URL and the ALB ARN for DNS record creation.

# main.tf (root module)

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = local.common_tags
  }
}

module "network" {
  source      = "./modules/network"
  name_prefix = local.name_prefix
  vpc_cidr    = var.vpc_cidr
  az_count    = local.az_count
  tags        = local.common_tags
}

module "compute" {
  source              = "./modules/compute"
  name_prefix         = local.name_prefix
  vpc_id              = module.network.vpc_id
  public_subnet_ids   = module.network.public_subnet_ids
  private_subnet_ids  = module.network.private_subnet_ids
  instance_type       = var.instance_type
  acm_certificate_arn = var.acm_certificate_arn
  instance_profile_name = var.instance_profile_name
  access_log_bucket   = var.access_log_bucket
  asg_min             = var.asg_min
  asg_max             = var.asg_max
  asg_desired         = var.asg_desired
  tags                = local.common_tags
}

# outputs.tf
output "alb_dns_name" {
  description = "DNS name of the Application Load Balancer."
  value       = module.compute.alb_dns_name
}

output "vpc_id" {
  description = "VPC ID."
  value       = module.network.vpc_id
}

output "asg_name" {
  description = "Name of the Auto Scaling Group."
  value       = module.compute.asg_name
}

# ---
# Deployment commands (run by CI after plan is approved):

# Init (downloads providers, configures S3 backend):
terraform init \
  -backend-config="bucket=acme-terraform-state-prod" \
  -backend-config="key=web-stack/production/terraform.tfstate" \
  -backend-config="region=us-east-1"

# Plan (output saved as artifact for review gate):
terraform plan -var-file=envs/production.tfvars -out=tfplan

# Apply (uses the saved plan — no re-plan surprises):
terraform apply tfplan

# Post-apply smoke test:
ALB=$(terraform output -raw alb_dns_name)
curl -sf --retry 5 --retry-delay 10 "https://${ALB}/health" \
  || { echo "Smoke test failed — rolling back"; terraform destroy -auto-approve -target=module.compute; exit 1; }

Always apply a saved plan file in CI. Running terraform apply without -out=tfplan and then terraform apply tfplan means Terraform creates a fresh plan at apply time. Between human review and apply, another pipeline or manual change could alter the state — producing an apply that does not match what was reviewed. Saving the plan with -out and applying that exact artifact is the only way to guarantee plan-review integrity. HashiCorp Terraform Cloud enforces this as a mandatory workflow feature for enterprise plans.

Production Failure Modes to Know

After running dozens of Terraform-managed rollouts you will encounter predictable failure patterns. Knowing them in advance turns a midnight incident into a ten-minute fix:

State lock not released after an interrupted apply: Run terraform force-unlock <LOCK_ID> — the lock ID is shown in the error. Verify the previous apply actually failed before unlocking; if it completed, the unlock is harmless. If another apply is genuinely running, never force-unlock.
Desired capacity drift in ASG: If an operator manually adjusts desired_capacity in the AWS console, the next terraform plan will show a diff and reset it. Use ignore_changes = [desired_capacity] in the ASG lifecycle block if you manage desired capacity through a separate auto-scaling policy.
NAT Gateway EIP limit: AWS default is 5 EIPs per region. A three-AZ stack needs 3 EIPs for NAT Gateways. Across multiple environments in one region you hit the limit quickly — request a quota increase as part of the initial infrastructure setup, before the first apply.
AMI deregistration: If the AMI used by the launch template is deregistered, new ASG instances fail to launch but existing instances are unaffected. The fix is to update the launch template's AMI reference and trigger an instance refresh. Always pin launch templates to AMIs managed via AWS Image Builder or Packer pipelines, not to public AMIs that can be removed.
Provider version mismatch across workspaces: A colleague runs terraform init -upgrade and commits an updated .terraform.lock.hcl that pins a new provider version. Your CI picks it up on the next run. The new provider may have a breaking schema change for a resource you use. Solution: review lock file diffs in PRs with the same scrutiny as application code changes.

Module versioning in team environments: In solo or small-team projects it is acceptable to reference modules via relative paths (./modules/network). In large organizations, modules are published to a private Terraform registry or to a Git repository with tagged releases, and callers pin to a semantic version: source = "git::https://github.com/acme/terraform-modules.git//network?ref=v2.3.0". This ensures that a module change in one team's branch does not silently break another team's infrastructure on their next init.