GCP Compute & Networking
GCP Compute & Networking
Google Cloud Platform was engineered from the inside out — built on the same global infrastructure that serves Google Search, Gmail, and YouTube. That origin story is not marketing; it directly determines how GCP computes and routes traffic, and why its architecture differs from both AWS and Azure in ways that matter enormously at scale. In this lesson you will learn GCE (the raw VM layer), GCP's uniquely global VPC model, and the load balancing tier that operates at Google's own network edge.
Google Compute Engine (GCE)
GCE is GCP's IaaS VM service. On the surface it resembles EC2 — you pick a machine type, image, disk, and network — but three design decisions set it apart.
Custom machine types. Rather than choosing from a fixed list of instance families, GCE lets you specify an exact vCPU-and-memory ratio. Need 6 vCPUs and 22 GB RAM for a legacy app that doesn't fit standard shapes? That is a first-class option. This matters operationally because right-sizing VMs is the single highest-ROI cost optimisation and GCE makes it precise.
Live migration. Google migrates running VMs transparently across physical hosts during maintenance events — no forced reboots, no interruption windows to schedule around. AWS and Azure offer scheduled maintenance windows; GCE makes it invisible by default. This changes how you architect for HA: you still need redundancy, but you are not fighting your cloud's maintenance schedule.
Preemptible and Spot VMs. GCE's equivalent of AWS Spot are Spot VMs (the modern term; older docs say "preemptible"). They are up to 91% cheaper but can be reclaimed with a 30-second notice. They are ideal for stateless batch jobs, CI runners, and ML training checkpointing every few minutes.
{family}-{vCPU}[-{memory}]. General-purpose: n2-standard-4 (4 vCPU, 16 GB). Compute-optimised: c3-highcpu-8. Memory-optimised: m3-megamem-64. Custom: custom-6-22528 (6 vCPU, 22 GB). Check gcloud compute machine-types list --zones us-central1-a for current availability.
GCP's Global VPC Model
This is where GCP diverges most fundamentally from AWS. In AWS, a VPC is regional — you create separate VPCs per region and connect them with peering or Transit Gateway. In GCP, a single VPC spans all regions globally. You add subnets in specific regions, but they all belong to one logical network and can communicate using internal RFC-1918 addresses without any peering or gateway.
The practical implications are significant:
- A GKE cluster in
us-central1and a Cloud SQL instance ineurope-west4can speak to each other on private IPs with no extra networking config — same VPC. - Firewall rules are applied to the VPC as a whole, using network tags or service accounts as targets rather than security groups scoped to a VPC. A tag like
allow-httpcan be attached to any VM anywhere in the VPC. - Subnet IP ranges are regional but do not need to be unique across regions (though they must not overlap if you use Shared VPC or peering).
10.128.0.0/9 block conflicts with many on-premises networks. Always create custom-mode VPCs in production and define your own CIDRs deliberately.
Shared VPC and VPC Peering
At enterprise scale, a single VPC is not enough. GCP offers two multi-network models: Shared VPC and VPC Network Peering. Shared VPC designates one project as the host and allows multiple service projects to attach their resources (VMs, GKE nodes) to the host's subnets. This centralises network administration — firewall rules and subnets are managed by the platform team while application teams deploy freely into those subnets. VPC Peering, by contrast, connects two separate VPC networks without making one subordinate to the other; useful for SaaS providers needing private connectivity to customer VPCs.
Cloud Load Balancing: The Google Difference
GCP's load balancing is not a fleet of virtual appliances deployed into your VPC — it is a globally distributed system running on Google's own edge infrastructure (the same infrastructure that absorbs multi-Tbps DDoS attacks). Traffic enters the Google network at the closest point of presence, and the load balancer makes routing decisions there before the packet ever reaches your VMs. This has three concrete consequences for DevOps engineers:
- Anycast IPs. A single global IP routes to the nearest healthy backend pool worldwide. AWS ALB gives you a DNS name that resolves to regional IPs; GCP's global HTTP(S) LB gives you one IP that works in every region. No Traffic Manager or Route53 latency policies needed.
- Near-instant failover. Because backend health checks run from Google's infrastructure, failover is detected and propagated in under 10 seconds globally — critical for cross-region HA.
- Built-in Cloud Armor. GCP's WAF (Cloud Armor) integrates directly with the HTTP(S) LB at the edge. DDoS mitigation and OWASP rule evaluation happen before traffic reaches your VPC, protecting capacity and reducing compute costs.
GCP offers several LB tiers. Know which to use:
- Global HTTP(S) LB — Layer 7, anycast, multi-region backends, Cloud CDN integration. Use for public-facing web services.
- Regional Internal HTTP(S) LB — L7, private, intra-VPC. Use for service-mesh east-west traffic.
- TCP/UDP Network LB — L4, regional, high-throughput. Use for non-HTTP protocols.
- Internal TCP/UDP LB — L4, private, for internal services.
Cloud NAT and Private Google Access
VMs in a private subnet need egress without a public IP. GCP's Cloud NAT is a fully managed, highly available NAT service — no NAT gateway instances to patch or scale. It automatically scales port allocation and logs every translation to Cloud Logging. Pair it with Private Google Access on your subnets: when enabled, VMs with only internal IPs can still reach Google APIs (Cloud Storage, Pub/Sub, BigQuery) over Google's private network without a NAT hop, saving egress cost and keeping data off the public internet entirely.
0.0.0.0/0. Verify with gcloud compute routers get-nat-mapping-info <router-name> --region=<region>.
Production Failure Modes to Know
These are the GCP compute and networking issues that surface most often in production on-call rotations:
- Quota exhaustion. GCE has per-project, per-region quotas on CPUs, IPs, and persistent disk. Autoscaling jobs fail silently when quota is hit. Monitor
serviceruntime.googleapis.com/quota/exceededin Cloud Monitoring and request quota increases ahead of growth events. - Managed instance group (MIG) healing loops. If a health check path returns non-200 during application startup (before the process is ready), the MIG will destroy and recreate the instance repeatedly. Always implement a dedicated
/healthzendpoint and setinitial_delay_secon the auto-healing policy to cover your worst-case cold-start time. - Firewall rule priority conflicts. GCP firewall rules are ordered by priority (0–65535, lower = higher priority). The default
deny-all-ingressimplicit rule at priority 65535 is often overridden correctly, but a misconfigured allow rule at a higher-priority number (e.g., 65534) will silently lose to a deny rule at 1000. Always verify effective rules withgcloud compute firewall-rules list --sort-by=priority. - Backend drain time misconfiguration. The LB backend service has a
connection_draining_timeout_sec(default 300s). During rolling updates, if your app takes longer than this to finish in-flight requests, connections are forcibly terminated. Tune this to match your p99 request latency plus buffer.
GCP's compute and networking layer rewards engineers who take the time to understand its global-first design. The VPC model, the LB architecture, and live migration together enable a class of always-on, globally distributed systems that would require significantly more configuration complexity on other clouds.