Multi-Cloud: Azure & GCP

GKE: Kubernetes the Google Way

18 min Lesson 7 of 28

GKE: Kubernetes the Google Way

Google invented Kubernetes. GKE is not a port of upstream Kubernetes to GCP — it is the reference implementation, maintained by the same engineering teams that run Kubernetes at planetary scale inside Google's own infrastructure. Every GKE release ships before the equivalent self-managed version is even stable. That lineage matters: GKE's defaults encode decisions that took Google years of production pain to reach. This lesson dissects those decisions — Autopilot vs Standard, node pool design, and Workload Identity — so you can operate GKE with the judgment of someone who has run it at Big-Tech scale.

Autopilot vs Standard: the Right Default

GKE offers two modes of operation that trade control for operational simplicity.

Standard mode gives you full control over node configuration: you choose machine type, disk size, GPU allocation, node OS (Container-Optimized OS or Ubuntu), and every kubelet flag. You are responsible for node pool sizing, cluster autoscaler configuration, and paying for idle node capacity. Standard is the right choice when you need custom hardware (TPUs, A100 GPUs), specific kernel parameters, or DaemonSets that must run on every node unconditionally.

Autopilot mode removes node management entirely. You declare Pods; GCP provisions, patches, and scales the underlying nodes automatically. You pay per Pod's requested CPU/memory, not per node. Autopilot enforces a hardened security posture by default: no privileged containers, no host networking, no hostPath volumes, and mandatory resource requests on every container. For 80 % of production workloads — web services, APIs, batch jobs, microservices — Autopilot is the production-correct default. It removes the #1 source of GKE operational toil: right-sizing node pools.

The Autopilot billing model is a paradigm shift. In Standard mode, a node with 8 vCPU sits idle between deployments and you still pay for it. In Autopilot you pay only for the sum of requested CPU and memory across all running Pods. Overprovisioned nodes disappear as a cost vector. At scale this routinely cuts compute bills 30–50 % compared with under-optimised Standard clusters.

Choosing Autopilot vs Standard — Decision Criteria

Use Autopilot unless you have a specific requirement that forces Standard:

  • Need Autopilot: stateless services, event-driven workloads, multi-tenant developer clusters, cost optimisation as a first-class goal, no custom OS/kernel requirements.
  • Need Standard: privileged DaemonSets (CNI plugins, eBPF-based security agents), GPU/TPU workloads, spot-node pools for large batch jobs, custom node taints enforced at the OS level, or compliance requirements that mandate specific OS images.

Creating an Autopilot cluster is a single flag:

# Create an Autopilot cluster in us-central1 gcloud container clusters create-auto prod-cluster \ --region=us-central1 \ --release-channel=regular \ --network=prod-vpc \ --subnetwork=prod-gke-subnet \ --cluster-secondary-range-name=pods \ --services-secondary-range-name=services # Get credentials gcloud container clusters get-credentials prod-cluster \ --region=us-central1

Node Pools in Standard Mode

When Standard mode is warranted, node pool design becomes a first-order architectural decision. A node pool is a group of nodes with identical machine type, disk, OS, and labels. GKE lets you run multiple pools in one cluster, and this is how production clusters achieve cost efficiency alongside performance SLOs.

GKE Standard Cluster Node Pool Layout GKE Standard Cluster — prod-cluster system-pool e2-standard-4 · 3 nodes taint: CriticalAddonsOnly kube-dns cluster-autoscaler metrics-server cloud-controller-manager app-pool n2-standard-8 · 3–20 nodes Cluster Autoscaler enabled api-service / web-frontend worker / job-processor gpu-pool a2-highgpu-1g · 0–5 nodes (spot) taint: nvidia.com/gpu=present:NoSchedule ml-inference (tolerates taint) batch-training (spot, tolerates)
A production GKE Standard cluster with three node pools: system (addons), app (general workloads), and GPU (ML inference/training on spot nodes).

The canonical pattern is three pools: a small system pool for cluster add-ons (tainted CriticalAddonsOnly so application Pods cannot schedule there), a general-purpose app pool with cluster autoscaler enabled, and a specialised pool (GPU, high-memory, or spot) for cost-sensitive workloads.

# Create a Standard cluster with a system pool gcloud container clusters create prod-cluster \ --region=us-central1 \ --release-channel=regular \ --enable-ip-alias \ --network=prod-vpc \ --subnetwork=prod-gke-subnet \ --node-pool=system-pool \ --machine-type=e2-standard-4 \ --num-nodes=1 \ --node-taints=CriticalAddonsOnly=true:NoSchedule # Add an app pool with autoscaling gcloud container node-pools create app-pool \ --cluster=prod-cluster \ --region=us-central1 \ --machine-type=n2-standard-8 \ --num-nodes=3 \ --enable-autoscaling \ --min-nodes=3 \ --max-nodes=20 \ --disk-type=pd-ssd \ --disk-size=100 # Add a GPU spot pool for ML workloads gcloud container node-pools create gpu-pool \ --cluster=prod-cluster \ --region=us-central1 \ --machine-type=a2-highgpu-1g \ --accelerator=type=nvidia-tesla-a100,count=1 \ --spot \ --num-nodes=0 \ --enable-autoscaling \ --min-nodes=0 \ --max-nodes=5 \ --node-taints=nvidia.com/gpu=present:NoSchedule
Always separate system and application workloads. Taint your system pool CriticalAddonsOnly=true:NoSchedule. This prevents application Pods — especially memory-leaking ones — from evicting cluster-critical DaemonSets like fluentd or kube-proxy under memory pressure. At Google this pattern is mandatory for any cluster above 10 nodes.

Workload Identity: The Correct Way to Grant Cloud Permissions

The most common GKE security mistake is storing a GCP service account key as a Kubernetes Secret and mounting it into Pods. Keys can be exfiltrated, rotated improperly, and left behind in container images. Workload Identity eliminates keys entirely by binding a Kubernetes ServiceAccount to a GCP IAM Service Account using a federated token exchange — the Pod gets a short-lived OIDC token automatically, with zero static credentials anywhere.

The binding works in three steps: enable Workload Identity on the cluster, annotate the Kubernetes ServiceAccount with the GCP IAM Service Account, and grant the GCP IAM Service Account the roles/iam.workloadIdentityUser role on the Kubernetes namespace/ServiceAccount pair.

# 1. Enable Workload Identity on an existing cluster gcloud container clusters update prod-cluster \ --region=us-central1 \ --workload-pool=YOUR_PROJECT_ID.svc.id.goog # 2. Create a GCP IAM Service Account for the workload gcloud iam service-accounts create api-service-sa \ --display-name="API Service SA" # Grant it the permissions it actually needs (principle of least privilege) gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:api-service-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/storage.objectViewer" # 3. Bind the Kubernetes ServiceAccount to the GCP IAM SA gcloud iam service-accounts add-iam-policy-binding \ api-service-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com \ --role="roles/iam.workloadIdentityUser" \ --member="serviceAccount:YOUR_PROJECT_ID.svc.id.goog[api-namespace/api-service-ksa]"

The Kubernetes ServiceAccount manifest needs a single annotation to complete the binding:

# kubernetes/service-account.yaml apiVersion: v1 kind: ServiceAccount metadata: name: api-service-ksa namespace: api-namespace annotations: iam.gke.io/gcp-service-account: api-service-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com --- # Use it in a Deployment — no secret mounts required apiVersion: apps/v1 kind: Deployment metadata: name: api-service namespace: api-namespace spec: template: spec: serviceAccountName: api-service-ksa # <-- this is all that is needed containers: - name: api image: gcr.io/YOUR_PROJECT_ID/api:v1.2.0 # GCP client libraries auto-detect the Workload Identity token # via the metadata server at 169.254.169.254
Workload Identity does not work if your node pool still has --no-enable-autoupgrade on very old node versions. Verify with kubectl describe node | grep -i workload and ensure nodes are running GKE 1.18+ (all release channels are well past this). Also, any Pod that calls the GCP metadata server directly to get node-level credentials (a classic lateral-movement attack) is blocked by Workload Identity — the metadata server returns a token scoped only to the bound Kubernetes SA, not the node SA. This is a critical security boundary.

Release Channels and the Upgrade Contract

GKE clusters subscribe to a release channel — rapid, regular, or stable. Google manages all control-plane upgrades automatically on channeled clusters. Node upgrades can be configured with surge upgrades (extra nodes brought up before old ones are drained) or blue/green node pool upgrades (full parallel pool, traffic shifted, old pool deleted). For production, regular channel with blue/green upgrades is the recommended baseline: you stay current without running on untested builds, and upgrades are zero-downtime.

GKE Autopilot handles upgrades for you entirely. In Standard mode, configure surge upgrades: --max-surge-upgrade=1 --max-unavailable-upgrade=0. This ensures at least one extra node is always available during a rolling upgrade so no workload is evicted without a landing spot.

Production Failure Modes to Know

  • PodDisruptionBudget gaps during node upgrades: if you have not defined a PDB, GKE will drain a node aggressively and your service will see errors. Define minAvailable: 2 for any production Deployment with more than one replica.
  • IP exhaustion: GKE in VPC-native mode reserves a large secondary CIDR for Pods. If you undersize the secondary subnet at cluster creation, you hit the IP ceiling silently — new Pods fail to schedule with no available IP addresses. Plan for peak Pod count + 30 % headroom at day zero.
  • Workload Identity metadata server latency: the first token fetch from 169.254.169.254 can add 200–400 ms to a cold-start. Pre-warm GCP client libraries at application startup, not per-request.