Cloud Fundamentals: AWS Core Services

Auto Scaling Groups & ELB

18 min Lesson 7 of 30

Auto Scaling Groups & ELB

Any production system designed for a fixed number of instances is designed to fail at scale. Auto Scaling Groups (ASGs) and Elastic Load Balancing (ELB) are the two primitives that make AWS workloads self-healing and horizontally scalable. Together with Launch Templates, they form a trio that every DevOps engineer must understand at depth — not just to pass an exam, but because misconfiguring any one of them is responsible for a large fraction of production outages at cloud-native companies.

This lesson covers the full picture: how Launch Templates describe what to launch, how ASGs decide when and how many to launch, and how Target Groups with health checks ensure only healthy instances receive traffic — with a focus on real production failure modes along the way.

Launch Templates: The Immutable Blueprint

A Launch Template (LT) is a versioned, immutable specification of everything needed to boot an EC2 instance: AMI ID, instance type, key pair, security groups, IAM instance profile, user data, EBS volumes, and network settings. It replaces the older Launch Configuration (now deprecated — do not use it for new workloads).

The key advantage of Launch Templates over Launch Configurations is versioning. You can have a $Default version and a $Latest version, and your ASG can pin to a specific version so a botched AMI update does not automatically roll out to production. This matters enormously at scale: at big-tech companies, the LT version is controlled by your CI/CD pipeline and promoted through dev → staging → production gates.

## Create a Launch Template via AWS CLI
aws ec2 create-launch-template \
  --launch-template-name "myapp-prod-lt" \
  --version-description "v1.0 — initial golden AMI bake" \
  --launch-template-data '{
    "ImageId": "ami-0a1b2c3d4e5f67890",
    "InstanceType": "m7g.large",
    "IamInstanceProfile": {
      "Arn": "arn:aws:iam::123456789012:instance-profile/myapp-ec2-profile"
    },
    "SecurityGroupIds": ["sg-0abc123def456"],
    "UserData": "IyEvYmluL2Jhc2gKc2V0IC1ldW8gcGlwZWZhaWw=",
    "BlockDeviceMappings": [
      {
        "DeviceName": "/dev/xvda",
        "Ebs": {
          "VolumeSize": 30,
          "VolumeType": "gp3",
          "Iops": 3000,
          "Encrypted": true,
          "DeleteOnTermination": true
        }
      }
    ],
    "MetadataOptions": {
      "HttpTokens": "required",
      "HttpPutResponseHopLimit": 1
    },
    "TagSpecifications": [
      {
        "ResourceType": "instance",
        "Tags": [
          {"Key": "Name", "Value": "myapp-prod"},
          {"Key": "Environment", "Value": "production"},
          {"Key": "ManagedBy", "Value": "ASG"}
        ]
      }
    ]
  }'

IMDSv2 is mandatory in production. The MetadataOptions.HttpTokens: required setting enforces IMDSv2 (token-based metadata access). IMDSv1 is exploitable via SSRF — if an attacker can make your app issue an HTTP request to 169.254.169.254, they can steal the instance role credentials. IMDSv2 requires a PUT request to obtain a session token first, which SSRF cannot do. Set it in every Launch Template you create, no exceptions.

Auto Scaling Groups: Self-Healing Fleets

An Auto Scaling Group maintains a fleet of EC2 instances between a configured minimum and maximum count. It uses your Launch Template to boot new instances and terminates old ones based on scaling policies and health check results. The ASG is the entity that glues together the LT (what to run), the Target Group (where to register), and CloudWatch metrics (when to scale).

Key ASG configuration parameters:

MinSize / MaxSize / DesiredCapacity — the floor, ceiling, and current target instance count. In production, never set MinSize to 0 unless the workload is truly batch/off-hours.
Multi-AZ distribution — always span at least two Availability Zones. The ASG's AvailabilityZones or VPCZoneIdentifier (subnet IDs) controls this. When AZ-a fails, the ASG automatically launches replacements in AZ-b.
Health check type — either EC2 (instance status checks only) or ELB (the load balancer's health check result). Always use ELB in production: an instance can pass EC2 status checks while your application is totally broken.
Warmup / cooldown — DefaultInstanceWarmup tells the ASG how long to wait for a new instance to be ready before counting its metrics toward scaling decisions. Without this, CloudWatch sees a momentarily high load average during bootstrap and over-provisions.

## Create an ASG tied to the Launch Template and a Target Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name "myapp-prod-asg" \
  --launch-template "LaunchTemplateName=myapp-prod-lt,Version=1" \
  --min-size 2 \
  --max-size 20 \
  --desired-capacity 4 \
  --vpc-zone-identifier "subnet-aaa111,subnet-bbb222,subnet-ccc333" \
  --target-group-arns "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp-prod-tg/abc123" \
  --health-check-type ELB \
  --health-check-grace-period 120 \
  --default-instance-warmup 90 \
  --termination-policies "OldestLaunchTemplate" "AllocationStrategy" \
  --capacity-rebalance

## Attach a target-tracking scaling policy (scale on average CPU 60%)
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "myapp-prod-asg" \
  --policy-name "myapp-cpu-tracking" \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": 60.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleInCooldown": 300,
    "ScaleOutCooldown": 60
  }'

Use target-tracking policies rather than step scaling for most services — they are self-adjusting and far simpler to reason about. Reserve step scaling for workloads where you have strong opinions about the exact step sizes (e.g., you know from load testing that going from 60% to 80% CPU needs exactly 3 extra instances, not a proportional amount). For latency-sensitive services, track RequestCountPerTarget on the ALB rather than CPU — it reacts faster and correlates directly with user experience.

Target Groups and Health Checks: The Contract Between ALB and ASG

A Target Group (TG) is the mechanism by which an Application Load Balancer (ALB) or Network Load Balancer (NLB) knows which instances (or IPs, or Lambda functions) to route requests to. When you attach a TG to an ASG, every instance the ASG launches is automatically registered in the TG; every instance the ASG terminates is deregistered. This is seamless — but health checks are what make it safe.

A Target Group health check continuously polls each registered target on a configured path and port. Results:

Healthy — the target is in the rotation and receives traffic.
Unhealthy — the target is removed from rotation. After a configurable number of consecutive failures, the ASG also receives a health failure notification and will terminate and replace the instance.
Initial — the target was just registered and is in the grace period before health checks begin.
Draining (Deregistration delay) — when an instance is marked for termination, the TG waits up to deregistration_delay seconds for in-flight requests to complete before closing connections. Default is 300 s — too long for most apps; tune it to 30–60 s so rolling deployments finish faster.

## Create a Target Group with a meaningful health check
aws elbv2 create-target-group \
  --name "myapp-prod-tg" \
  --protocol HTTP \
  --port 8080 \
  --vpc-id vpc-0abc1234 \
  --target-type instance \
  --health-check-protocol HTTP \
  --health-check-path /health \
  --health-check-interval-seconds 15 \
  --health-check-timeout-seconds 5 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3 \
  --matcher HttpCode=200

## Reduce deregistration delay to speed up rolling deploys
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp-prod-tg/abc123 \
  --attributes Key=deregistration_delay.timeout_seconds,Value=30

## Create the ALB listener rule pointing to the TG
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/myapp-alb/xyz \
  --protocol HTTPS \
  --port 443 \
  --ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
  --certificates CertificateArn=arn:aws:acm:us-east-1:123456789012:certificate/abc \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp-prod-tg/abc123

A shallow health check endpoint is a production time bomb. If your /health path returns HTTP 200 without actually checking whether your application can connect to the database, call downstream services, or read from its config, then ALB will happily route traffic to a broken instance that is returning 200 on health checks and 500 on real requests. Build a deep health check that validates all critical dependencies — and make it fast (under 2 s). Separate the deep check from the liveness check if your app has a slow startup.

The Trio: How Launch Template, ASG, and Target Group Work Together

Understanding the flow of a request — from user browser to your application running on an instance that was automatically provisioned — requires understanding how all three components interact. The diagram below shows this in full.

The trio in production: the ALB forwards requests to healthy instances registered in the Target Group; the ASG uses the Launch Template to provision replacements and scales the fleet based on CloudWatch alarms.

Instance Refresh: Zero-Downtime AMI Rollouts

When you bake a new AMI (e.g., after a security patch), you need to replace all running instances without dropping traffic. The ASG Instance Refresh feature handles this automatically: it respects the MinHealthyPercentage and drains the TG before terminating each batch.

## Trigger an instance refresh to roll out a new Launch Template version
aws autoscaling start-instance-refresh \
  --auto-scaling-group-name "myapp-prod-asg" \
  --preferences '{
    "MinHealthyPercentage": 90,
    "InstanceWarmup": 120,
    "CheckpointPercentages": [33, 66, 100],
    "CheckpointDelay": 600,
    "SkipMatching": false
  }' \
  --desired-configuration '{
    "LaunchTemplate": {
      "LaunchTemplateName": "myapp-prod-lt",
      "Version": "2"
    }
  }'

## Watch the refresh status
aws autoscaling describe-instance-refreshes \
  --auto-scaling-group-name "myapp-prod-asg" \
  --query 'InstanceRefreshes[0].{Status:Status,Percentage:PercentageComplete}' \
  --output table

Use Checkpoint-based refreshes for large fleets. Setting CheckpointPercentages to [33, 66, 100] with a CheckpointDelay of 600 seconds means the refresh pauses at 33% and 66% completion. This gives your monitoring time to catch regressions before the entire fleet is rolled. Combined with automated canary checks in your deployment pipeline, this pattern gives you near-zero-downtime blue/green-like safety with far simpler infrastructure than maintaining two full ASGs.

Production Failure Modes to Know

These are the most common ASG/ELB failure scenarios encountered in real production environments:

Health check grace period too short — if HealthCheckGracePeriod is shorter than your application startup time, the ASG will mark newly launched instances as unhealthy and immediately terminate them. The fleet then thrashes in a launch/terminate loop. Set the grace period to at least 1.5x your p99 startup time.
Deregistration delay too long — the default 300 s delay blocks rolling deployments. Instances sit in draining state for 5 minutes even when they have no in-flight connections, because the ALB conservatively waits. Tune this to 30–60 s for stateless HTTP services.
Scale-in termination targeting the wrong instance — the default termination policy removes instances in the AZ with the most instances, then the oldest LT version, then the one nearest its next billing hour. This is usually correct, but if you have stateful instances (e.g., a Kafka broker in the group), add a lifecycle hook on termination to gracefully drain the broker before the instance is killed.
Capacity rebalance for Spot not enabled — if you use Spot Instances in a mixed-instance policy, enable capacityRebalance: true on the ASG. Without it, AWS terminates Spot Instances with only a 2-minute warning, causing abrupt instance loss. With it, the ASG proactively launches a replacement when a Spot interruption notice arrives.