Kubernetes Fundamentals

ReplicaSets & Deployments

22 min Lesson 5 of 32

ReplicaSets & Deployments

Running a single Pod works for experimentation, but it is not production-grade. If the node hosting that Pod crashes, Kubernetes does nothing — the Pod is gone and so is your service. The answer is desired state: you tell Kubernetes how many replicas you want and it continuously makes reality match that declaration. ReplicaSets implement this guarantee; Deployments wrap ReplicaSets with safe rollout and rollback mechanics. Together they are how every stateless workload — APIs, web servers, batch workers — runs at scale in production.

Desired State and the Control Loop

Kubernetes is a level-triggered system. You write a spec that declares what you want (desired state). A controller watches the cluster and acts whenever actual state diverges from desired state. The ReplicaSet controller does exactly one thing: it reconciles spec.replicas (desired) against the number of running Pods whose labels match spec.selector (actual). Too few → create Pods. Too many → delete Pods. The spec is stored in etcd; it survives node crashes, restarts, and network partitions.

Why you almost never write a ReplicaSet directly: ReplicaSets give you self-healing replicas but no safe upgrade path. If you change the Pod template inside a ReplicaSet, existing Pods are not replaced — only new Pods created after the change use the new template. Deployments solve this by managing ReplicaSets for you and providing a controlled upgrade mechanism.

Anatomy of a Deployment

A Deployment manifest has three main sections: metadata, a selector, and a Pod template. The Deployment controller creates a ReplicaSet whose Pod template matches the one you specified, then ensures the correct number of Pods are running from that template. Here is a production-realistic manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
  labels:
    app: api-server
    team: platform
spec:
  replicas: 6
  selector:
    matchLabels:
      app: api-server          # MUST match pod template labels -- immutable after creation
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2              # at most 8 pods exist during rollout (6 + 2)
      maxUnavailable: 0        # never drop below 6 ready pods (zero-downtime)
  minReadySeconds: 10          # pod must be ready for 10 s before counted as available
  progressDeadlineSeconds: 300 # fail if rollout takes more than 5 min
  template:
    metadata:
      labels:
        app: api-server
        version: "2.4.1"
    spec:
      containers:
      - name: api
        image: myrepo/api-server:2.4.1
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /healthz/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /healthz/live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10

Production pitfall — selector immutability: The spec.selector field is immutable after a Deployment is created. If you need to change labels, you must delete and recreate the Deployment. In practice, big-tech teams use a stable label key like app: api-server in the selector and add mutable metadata like version only to the Pod template labels (not to the selector). Changing the selector will reject your kubectl apply with a validation error.

Rolling Updates: How Kubernetes Replaces Pods Safely

When you change anything in spec.template — a new image tag, an environment variable, a resource limit — Kubernetes creates a new ReplicaSet for the new Pod template and orchestrates a handoff between the old and new ReplicaSets. The two key levers are maxSurge and maxUnavailable.

Rolling update with maxSurge=2, maxUnavailable=0: ready Pods never drop below 6; at most 8 Pods exist during the transition.

The readiness probe is the gatekeeper. Kubernetes considers a Pod "available" only after its readiness probe passes and the Pod has been Ready for at least minReadySeconds. This means a buggy deployment that crashes on startup will stall — the new ReplicaSet never becomes available, maxUnavailable: 0 prevents the old Pods from being terminated, and you have time to notice and roll back before any traffic is lost.

Performing and Watching a Rollout

# Trigger a rollout by updating the image
kubectl set image deployment/api-server api=myrepo/api-server:2.5.0 -n production

# Or edit the manifest and re-apply
kubectl apply -f deployment.yaml

# Watch rollout progress in real time
kubectl rollout status deployment/api-server -n production
# Waiting for deployment "api-server" rollout to finish: 2 out of 6 new replicas have been updated...

# Inspect rollout history
kubectl rollout history deployment/api-server -n production
# REVISION  CHANGE-CAUSE
# 1         <none>
# 2         Upgrade to 2.5.0 -- JIRA-1234

# See details of a specific revision
kubectl rollout history deployment/api-server --revision=2 -n production

# Annotate the cause (best practice for audit trails)
kubectl annotate deployment/api-server \
  kubernetes.io/change-cause="Upgrade to 2.5.0 -- JIRA-1234" \
  -n production

Rollbacks: Undoing a Bad Deploy

Kubernetes keeps a configurable number of old ReplicaSets around after a rollout completes — controlled by spec.revisionHistoryLimit (default 10). Rolling back restores the previous ReplicaSet's Pod template and scales it back up, repeating the rolling update process in reverse. At Google and similar companies, rollback is a normal operational action, not a crisis — the system is designed to make it fast and safe.

# Immediate rollback to the previous revision
kubectl rollout undo deployment/api-server -n production

# Roll back to a specific revision number
kubectl rollout undo deployment/api-server --to-revision=1 -n production

# Watch the rollback proceed
kubectl rollout status deployment/api-server -n production

# Pause a rollout mid-flight (e.g. to inspect partial canary traffic)
kubectl rollout pause deployment/api-server -n production

# Resume a paused rollout
kubectl rollout resume deployment/api-server -n production

# See all ReplicaSets for a Deployment (old ones kept for rollback)
kubectl get replicaset -n production -l app=api-server
# NAME                          DESIRED   CURRENT   READY   AGE
# api-server-7d9f4b6c9d         6         6         6       5m   <-- active RS
# api-server-5f8a3b2e1a         0         0         0       2d   <-- kept for rollback

Production practice — set revisionHistoryLimit deliberately: The default of 10 old ReplicaSets consumes etcd storage and clutters kubectl get replicaset output. High-frequency deploy pipelines (multiple deploys per day) should set spec.revisionHistoryLimit: 3. Keep enough history to cover your typical rollback window — if you release every hour and your MTTR is 30 minutes, 2 revisions is sufficient.

Deployment Strategies Beyond RollingUpdate

Kubernetes natively supports two strategies set via spec.strategy.type:

RollingUpdate (default) — incrementally replaces old Pods with new ones. Zero-downtime when configured correctly (maxUnavailable: 0). Both old and new code run concurrently during the transition — your API must handle this: backward-compatible schema migrations, no breaking changes within the rollout window.
Recreate — kills all old Pods before creating new ones. Causes a brief downtime window. Use only for workloads that cannot run two versions simultaneously (e.g. single-writer database tools, legacy apps with exclusive file locks).

More sophisticated strategies — canary (route a small percentage of traffic to the new version) and blue/green (maintain two full environments, flip the Service selector) — are layered on top using multiple Deployments with different label selectors, traffic-splitting Ingress controllers, or a service mesh like Istio. These are covered in the Networking lesson.

# Recreate strategy -- for single-instance tools that cannot run two versions at once
apiVersion: apps/v1
kind: Deployment
metadata:
  name: db-migrator
spec:
  replicas: 1
  strategy:
    type: Recreate           # all old pods terminated before new ones start
  selector:
    matchLabels:
      app: db-migrator
  template:
    metadata:
      labels:
        app: db-migrator
    spec:
      containers:
      - name: migrator
        image: myrepo/db-migrator:3.0.0

Scaling Deployments

Scaling is near-instantaneous — Kubernetes updates spec.replicas in etcd and the ReplicaSet controller creates or deletes Pods to match. Manual scaling is useful for incident response; the Horizontal Pod Autoscaler (HPA) automates it based on CPU, memory, or custom metrics and is covered in a later tutorial.

# Scale manually
kubectl scale deployment/api-server --replicas=12 -n production

# Scale to zero (removes all pods -- useful for temporarily disabling a service)
kubectl scale deployment/api-server --replicas=0 -n production

# Check current replica status
kubectl get deployment api-server -n production
# NAME         READY   UP-TO-DATE   AVAILABLE   AGE
# api-server   12/12   12           12          3d

# Describe gives you events -- essential for diagnosing stuck rollouts
kubectl describe deployment api-server -n production

Production pitfall — Recreate on user-facing services: Some teams accidentally set strategy: Recreate on high-traffic Deployments because it is simpler to reason about. During a deploy, Kubernetes terminates all Pods before starting new ones — you get a hard outage equal to your container startup time (often 15-60 seconds for JVM or Python apps). Always use RollingUpdate with maxUnavailable: 0 for user-facing services, and invest in fast startup times so your readiness probe initial delay can be under 5 seconds.

The readinessProbe is Non-Negotiable

A rolling update without a readiness probe is dangerous. Without it, Kubernetes adds a new Pod to the Service endpoint list the moment it starts — before your app has finished initializing, running migrations, or warming caches. The first real user request hits an unready Pod and fails. Always configure readinessProbe on every container in every Deployment. The probe should test actual business readiness (database connectivity, cache warm-up, dependent service reachable), not just whether the process is alive.