GKE: Kubernetes the Google Way
GKE: Kubernetes the Google Way
Google invented Kubernetes. GKE is not a port of upstream Kubernetes to GCP — it is the reference implementation, maintained by the same engineering teams that run Kubernetes at planetary scale inside Google's own infrastructure. Every GKE release ships before the equivalent self-managed version is even stable. That lineage matters: GKE's defaults encode decisions that took Google years of production pain to reach. This lesson dissects those decisions — Autopilot vs Standard, node pool design, and Workload Identity — so you can operate GKE with the judgment of someone who has run it at Big-Tech scale.
Autopilot vs Standard: the Right Default
GKE offers two modes of operation that trade control for operational simplicity.
Standard mode gives you full control over node configuration: you choose machine type, disk size, GPU allocation, node OS (Container-Optimized OS or Ubuntu), and every kubelet flag. You are responsible for node pool sizing, cluster autoscaler configuration, and paying for idle node capacity. Standard is the right choice when you need custom hardware (TPUs, A100 GPUs), specific kernel parameters, or DaemonSets that must run on every node unconditionally.
Autopilot mode removes node management entirely. You declare Pods; GCP provisions, patches, and scales the underlying nodes automatically. You pay per Pod's requested CPU/memory, not per node. Autopilot enforces a hardened security posture by default: no privileged containers, no host networking, no hostPath volumes, and mandatory resource requests on every container. For 80 % of production workloads — web services, APIs, batch jobs, microservices — Autopilot is the production-correct default. It removes the #1 source of GKE operational toil: right-sizing node pools.
Choosing Autopilot vs Standard — Decision Criteria
Use Autopilot unless you have a specific requirement that forces Standard:
- Need Autopilot: stateless services, event-driven workloads, multi-tenant developer clusters, cost optimisation as a first-class goal, no custom OS/kernel requirements.
- Need Standard: privileged DaemonSets (CNI plugins, eBPF-based security agents), GPU/TPU workloads, spot-node pools for large batch jobs, custom node taints enforced at the OS level, or compliance requirements that mandate specific OS images.
Creating an Autopilot cluster is a single flag:
Node Pools in Standard Mode
When Standard mode is warranted, node pool design becomes a first-order architectural decision. A node pool is a group of nodes with identical machine type, disk, OS, and labels. GKE lets you run multiple pools in one cluster, and this is how production clusters achieve cost efficiency alongside performance SLOs.
The canonical pattern is three pools: a small system pool for cluster add-ons (tainted CriticalAddonsOnly so application Pods cannot schedule there), a general-purpose app pool with cluster autoscaler enabled, and a specialised pool (GPU, high-memory, or spot) for cost-sensitive workloads.
CriticalAddonsOnly=true:NoSchedule. This prevents application Pods — especially memory-leaking ones — from evicting cluster-critical DaemonSets like fluentd or kube-proxy under memory pressure. At Google this pattern is mandatory for any cluster above 10 nodes.
Workload Identity: The Correct Way to Grant Cloud Permissions
The most common GKE security mistake is storing a GCP service account key as a Kubernetes Secret and mounting it into Pods. Keys can be exfiltrated, rotated improperly, and left behind in container images. Workload Identity eliminates keys entirely by binding a Kubernetes ServiceAccount to a GCP IAM Service Account using a federated token exchange — the Pod gets a short-lived OIDC token automatically, with zero static credentials anywhere.
The binding works in three steps: enable Workload Identity on the cluster, annotate the Kubernetes ServiceAccount with the GCP IAM Service Account, and grant the GCP IAM Service Account the roles/iam.workloadIdentityUser role on the Kubernetes namespace/ServiceAccount pair.
The Kubernetes ServiceAccount manifest needs a single annotation to complete the binding:
--no-enable-autoupgrade on very old node versions. Verify with kubectl describe node | grep -i workload and ensure nodes are running GKE 1.18+ (all release channels are well past this). Also, any Pod that calls the GCP metadata server directly to get node-level credentials (a classic lateral-movement attack) is blocked by Workload Identity — the metadata server returns a token scoped only to the bound Kubernetes SA, not the node SA. This is a critical security boundary.
Release Channels and the Upgrade Contract
GKE clusters subscribe to a release channel — rapid, regular, or stable. Google manages all control-plane upgrades automatically on channeled clusters. Node upgrades can be configured with surge upgrades (extra nodes brought up before old ones are drained) or blue/green node pool upgrades (full parallel pool, traffic shifted, old pool deleted). For production, regular channel with blue/green upgrades is the recommended baseline: you stay current without running on untested builds, and upgrades are zero-downtime.
--max-surge-upgrade=1 --max-unavailable-upgrade=0. This ensures at least one extra node is always available during a rolling upgrade so no workload is evicted without a landing spot.
Production Failure Modes to Know
- PodDisruptionBudget gaps during node upgrades: if you have not defined a PDB, GKE will drain a node aggressively and your service will see errors. Define
minAvailable: 2for any production Deployment with more than one replica. - IP exhaustion: GKE in VPC-native mode reserves a large secondary CIDR for Pods. If you undersize the secondary subnet at cluster creation, you hit the IP ceiling silently — new Pods fail to schedule with
no available IP addresses. Plan for peak Pod count + 30 % headroom at day zero. - Workload Identity metadata server latency: the first token fetch from
169.254.169.254can add 200–400 ms to a cold-start. Pre-warm GCP client libraries at application startup, not per-request.