Cloud Concepts & the AWS Global Infrastructure
Cloud Concepts & the AWS Global Infrastructure
Every DevOps engineer who has on-call responsibilities eventually confronts a question at 2 AM: Is this an application bug, a network issue, or did an entire AWS data centre just go offline? Answering that question — and designing systems that survive the answer being "yes, the data centre is down" — requires a precise mental model of how AWS is physically and logically organised. This lesson builds that model from the ground up.
The Three Cloud Service Models
Before diving into AWS geography, it is worth anchoring the vocabulary you will see everywhere in job descriptions and AWS documentation.
- IaaS (Infrastructure as a Service): You rent raw compute, storage, and networking. AWS EC2, EBS, and VPC are IaaS. You manage the OS, runtime, and application; AWS manages the physical hardware.
- PaaS (Platform as a Service): The provider manages the runtime and OS. AWS Elastic Beanstalk and RDS are PaaS. You deploy code or a schema; AWS patches the underlying system.
- SaaS (Software as a Service): A fully managed application. AWS WorkMail or GitHub Actions (not strictly AWS, but the concept is the same) — you consume the feature, not the infrastructure.
Regions
AWS divides the world into Regions — large geographic areas, each fully independent. As of 2025, there are 34 launched regions (e.g. us-east-1 Northern Virginia, eu-west-1 Ireland, ap-southeast-1 Singapore). Each region is a separate blast radius: a failure inside us-east-1 does not propagate to eu-west-1.
Regions matter for three reasons beyond resilience:
- Latency: You want your compute close to your users. A European SaaS with all workloads in
us-east-1adds 80–120 ms of round-trip latency to every API call. - Data residency & compliance: GDPR requires EU personal data to stay in the EU. Choosing
eu-central-1Frankfurt oreu-west-1Ireland keeps you compliant. AWS governs this: data you put in a region stays in that region unless you explicitly replicate it. - Service availability: Not every AWS service launches in every region simultaneously.
us-east-1always gets new services first — it is also the region with the longest history of high-profile outages.
us-east-1 as your only region. It is where new AWS features land first, which means it also absorbs the most blast radius. For anything requiring four-nines uptime, deploy an active-active or active-passive configuration across two regions.Availability Zones
Within each region, AWS provides between 2 and 6 Availability Zones (AZs) — physically separate data centres, each with independent power, cooling, and networking, connected to each other via dedicated low-latency (single-digit milliseconds) dark-fibre links. The AZ naming convention appends a letter: us-east-1a, us-east-1b, us-east-1c.
AZs are the primary mechanism for achieving high availability within a region. The rule of thumb at every serious engineering org is: if you have stateless services, run at least 2 replicas spread across at least 2 AZs. For databases, run a primary in one AZ and a standby (or read replicas) in at least one other.
us-east-1a in your AWS account is not necessarily the same physical data centre as us-east-1a in a colleague's account. AWS shuffles the mapping to distribute load across physical facilities. Use AZ IDs (e.g. use1-az1) when coordinating across accounts — the ID is stable and physical-location-consistent. This trips up teams doing cross-account capacity reservations.Edge Locations & the CloudFront Network
Regions and AZs handle compute and storage. For content delivery — static assets, API acceleration, and DDoS mitigation — AWS provides a separate tier: Edge Locations, the physical Points of Presence (PoPs) where Amazon CloudFront caches content and where AWS Shield absorbs volumetric attacks.
As of 2025, AWS operates 600+ edge locations across 90+ cities globally, plus 13 Regional Edge Caches that sit between your origin (S3, ALB, EC2) and the edge nodes. The tiered cache structure means that a cache miss at an edge PoP checks the regional edge cache before hitting your origin — dramatically reducing origin load for popular content.
Other services that use the edge network:
- Route 53: AWS's global Anycast DNS network routes queries to the nearest resolver, achieving single-digit millisecond DNS resolution worldwide.
- AWS Global Accelerator: Routes TCP/UDP traffic to the nearest AWS edge node and keeps it on the private AWS backbone — bypassing the congested public internet for the long haul, reducing latency and packet loss for real-time applications.
- Lambda@Edge / CloudFront Functions: Execute lightweight code at the edge PoP for request manipulation, A/B testing, and auth validation without a round-trip to the origin region.
Querying the Infrastructure Programmatically
Understanding the physical layout lets you make architectural decisions in code. The AWS CLI gives you direct access to this metadata — use it to audit your infrastructure and build automation.
Production Failure Modes to Internalise
The architecture diagrams in textbooks show pristine boxes and arrows. Real AWS incidents look different. Three failure modes every DevOps engineer must design around:
- Single-AZ failure: Happens multiple times per year across the global fleet. If your Auto Scaling Group, RDS, or ECS service is pinned to one AZ, a power or network event there takes you offline. Design for AZ-level isolation from day one.
- Regional impairment:
us-east-1has had multiple significant events since 2011, including the 2021 Kinesis outage that cascaded into IAM and triggered a widespread multi-service degradation. True resilience requires a multi-region active-passive or active-active strategy for Tier-1 workloads. - The "it's just DNS" trap: Route 53 uses a global Anycast network and is one of the most reliable services AWS offers — but DNS TTLs still catch engineers off-guard during failovers. Set low TTLs (60s) on records that may need to change quickly in an incident.
aws:availability-zone tag (or use the metadata service on EC2) to every resource. During an incident, the first question is "which AZ is affected?" — having your CMDB or observability tooling answer that in seconds versus minutes is the difference between a 5-minute and a 45-minute MTTR.Key Takeaways
- AWS operates 34 independent Regions — geographic blast-radius boundaries. Choose regions for latency, compliance, and multi-region resilience strategy.
- Each Region has 2–6 Availability Zones — physically separate data centres with independent power, cooling, and networking, connected by <2ms dark-fibre links. AZs are your primary HA mechanism within a region.
- 600+ Edge Locations power CloudFront, Route 53, Shield, and Global Accelerator — getting content and DNS close to users worldwide.
- The Shared Responsibility Model divides security obligations: AWS owns the infrastructure; you own everything you put on top of it.
- AZ label-to-physical-location mapping is account-specific. Use AZ IDs (e.g.
use1-az1) when coordinating across accounts.