Rolling Deployments
Rolling Deployments
A rolling deployment replaces instances of the old version with the new version incrementally — one batch at a time — so that the application is never fully offline. It is the default deployment strategy in Kubernetes, Amazon ECS, and most managed compute platforms because it hits a practical sweet spot: zero downtime, minimal blast radius for a bad release, and no requirement for double the infrastructure (unlike blue-green).
Understanding the mechanics thoroughly — not just the happy path — is what separates engineers who configure a rolling deployment from engineers who can operate one safely in production.
The Mechanics: Surge, Unavailable, and Batch Size
Two parameters control the entire roll. In Kubernetes they live on the Deployment's spec.strategy.rollingUpdate stanza:
maxUnavailable— the maximum number of pods (or percentage ofreplicas) that may be below ready simultaneously during the rollout. Setting this to0means: never kill an old pod until a new one is confirmed healthy. This protects capacity at the cost of requiring the extra "surge" pod to exist.maxSurge— the maximum number of pods above the desired replica count that may exist simultaneously. Setting this to1means Kubernetes is allowed to briefly runreplicas + 1pods. This is how it safely creates the new pod before terminating the old one.
The two settings trade capacity risk against speed. At one extreme, maxUnavailable: 25%, maxSurge: 0 kills a quarter of pods first, then fills them — fast but briefly under capacity. At the other, maxUnavailable: 0, maxSurge: 1 always adds before removing — zero capacity loss but uses more nodes momentarily. Most production systems pick a middle ground based on their resource headroom and SLO.
maxUnavailable + maxSurge cannot both be zero — that would make progress impossible. Kubernetes validates this and rejects such a config. The defaults (maxSurge: 25%, maxUnavailable: 25%) are reasonable for stateless services but require tuning for anything with strict availability requirements.Connection Draining: Why It Matters
When Kubernetes marks a pod for termination it immediately removes it from all Service endpoint slices — new connections stop being routed to it. But in-flight requests already connected to that pod are still being processed. Without draining, those requests get abruptly reset mid-flight.
The solution is connection draining, achieved through two cooperating mechanisms:
terminationGracePeriodSecondson the pod — how long Kubernetes waits after sendingSIGTERMbefore force-killing withSIGKILL. Defaults to 30 seconds. Your application must listen forSIGTERMand begin a graceful shutdown: stop accepting new connections, finish active requests, then exit cleanly.preStoplifecycle hook — a small sleep (typically 5–15 seconds) injected beforeSIGTERMreaches the container. This window accounts for the propagation delay between the endpoint being removed fromServiceslices and all upstream load-balancer nodes (kube-proxy, Envoy sidecars, cloud LB) flushing their connection tables. Without this sleep, requests in-flight at the load balancer layer still arrive at the pod in the brief gap after its endpoint is removed but before it stops listening.
preStop sleep is the single most common cause of 502/504 errors during rolling deployments. The endpoint slice update and the load balancer flush are not instantaneous — there is a propagation window of 1–15 seconds depending on cluster size and LB type. Engineers who skip this see a clean rollout in staging (small clusters, fast propagation) and a burst of errors in production (large clusters, slow propagation). Always add the sleep.The Roll: Step-by-Step Diagram
The diagram below traces a 4-replica deployment rolling from v1 to v2 with maxSurge: 1, maxUnavailable: 0. Each column is one Kubernetes reconciliation tick.
Readiness Probes: The Safety Gate
The rolling controller will not remove an old pod until the new one passes its readiness probe. This is the mechanism that makes maxUnavailable: 0 meaningful. If your readiness probe is wrong — always-passes, checks the wrong path, or uses an initialDelaySeconds too short — the controller considers a booting pod "ready" and starts pulling capacity before the service can actually handle traffic.
At big-tech companies, readiness probes check a deep health endpoint that validates database connectivity, cache reachability, and critical dependency status — not just "is the HTTP port open." A pod that answers HTTP but cannot reach its database is not ready to serve traffic and must not receive it.
Monitoring the Rollout and Rollback
Never kick off a rolling deployment and walk away. Use kubectl rollout status to follow the wave live. Set a deadline with spec.progressDeadlineSeconds — if the rollout does not complete within that window, Kubernetes marks the deployment as stalled (surfaced as a DeadlineExceeded condition), which your CD system should treat as a failure and trigger rollback.
ECS Rolling Deployments
Amazon ECS uses different terminology for the same concepts. minimumHealthyPercent maps to the inverse of maxUnavailable, and maximumPercent determines the surge ceiling. For an ECS service with 10 tasks, setting minimumHealthyPercent: 90, maximumPercent: 110 is equivalent to Kubernetes's maxUnavailable: 1, maxSurge: 1. Load balancer connection draining is configured on the target group (deregistration_delay.timeout_seconds) — set it to match or exceed your longest expected request duration.
Production Failure Modes and How to Catch Them
Rolling deployments fail in predictable patterns. Knowing them in advance lets you instrument for them before the deploy, not after the incident:
- Version skew: During the roll, v1 and v2 run simultaneously. If v2 writes a database column that v1 does not know about, or changes an API response shape that a v1 caller expects, you have a skew bug. Lesson 8 (Expand-Contract) addresses this specifically — never deploy schema changes and application changes in the same roll.
- Slow readiness probe: If your app takes 90 seconds to warm up but your
initialDelaySecondsis 10, the probe will fire before the app is ready, fail, and the pod restarts in a CrashLoopBackOff cycle. The rollout stalls. SetinitialDelaySecondsgenerously and usestartupProbefor apps with variable boot times. - Resource starvation during surge: With
maxSurge: 2on a 10-replica deployment you briefly need nodes for 12 pods. If your cluster is already at 95% capacity, the surge pods will bePendingand the rollout will deadlock. Cluster autoscaler helps, but it has its own latency (1–3 minutes to provision a new node). Size your cluster with at least 20–30% headroom for rolling deployments. - PodDisruptionBudget (PDB) conflict: A PDB with
minAvailable: 100%(ormaxUnavailable: 0) will conflict with a Deployment whosemaxUnavailableis also 0 — the eviction controller cannot satisfy both constraints and the rollout deadlocks. Coordinate your PDB and Deployment rolling settings.
Rolling deployments are the workhorse strategy for routine releases. They are not appropriate for breaking changes — use feature flags (Lesson 5) or blue-green (Lesson 3) when the old and new versions cannot safely coexist. For everything else, a well-tuned rolling deployment with proper draining, tight readiness probes, and a circuit breaker is the right default.