Every concept from this tutorial — ConfigMaps, Secrets, liveness/readiness/startup probes, resource requests and limits, and Horizontal Pod Autoscaling — exists to solve a concrete production problem. In isolation each piece is easy to understand; in combination, at scale, the interactions are what trip teams up. This project wires all of them together into a single, deployable, battle-ready workload that you could ship to a real cluster today.
We will build a stateless API service called order-api. It reads non-sensitive runtime configuration from a ConfigMap, mounts database credentials from a Secret, exposes HTTP health endpoints consumed by Kubernetes probes, declares CPU/memory budgets that the scheduler and kernel enforce, and scales horizontally under load. Every decision below mirrors what senior engineers write at companies running thousands of pods per cluster.
Step 1 — Namespace and Supporting Objects
Always isolate workloads in their own namespace. This gives you RBAC boundaries, resource quotas, and clean kubectl get all -n orders output.
In real clusters, Secrets come from an external vault (AWS Secrets Manager via ASCP, HashiCorp Vault Agent Injector, or Sealed Secrets). The stringData shortcut shown here is fine for bootstrapping, but rotate credentials immediately after first deploy and integrate a vault-sync controller before you call the service production-ready.
Step 2 — The Deployment Manifest
This is the core of the project. Read through every annotated field — each one addresses a documented production failure mode.
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-api
namespace: orders
labels:
app: order-api
version: "1.0.0"
spec:
replicas: 2 # HPA will override this at runtime
selector:
matchLabels:
app: order-api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # one extra pod during rollout
maxUnavailable: 0 # zero downtime — never kill before replacement is Ready
template:
metadata:
labels:
app: order-api
version: "1.0.0"
spec:
terminationGracePeriodSeconds: 30 # time to drain in-flight requests
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: api
image: registry.example.com/order-api:1.0.0
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
# ── Config from ConfigMap (envFrom = all keys at once) ───────────
envFrom:
- configMapRef:
name: order-api-config
# ── Secrets mounted as individual env vars ───────────────────────
env:
- name: DB_USER
valueFrom:
secretKeyRef:
name: order-api-secret
key: DB_USER
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: order-api-secret
key: DB_PASSWORD
- name: JWT_SIGNING_KEY
valueFrom:
secretKeyRef:
name: order-api-secret
key: JWT_SIGNING_KEY
# ── Resource budget ───────────────────────────────────────────────
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m" # 1 vCPU — throttled, not killed
memory: "512Mi" # OOM-killed if exceeded — size carefully
# ── Startup probe — give the JVM / migration time to finish ───────
startupProbe:
httpGet:
path: /healthz/startup
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 24 # 24 * 5s = 2 minutes max startup window
successThreshold: 1
# ── Liveness probe — restart the container if it deadlocks ────────
livenessProbe:
httpGet:
path: /healthz/live
port: 8080
initialDelaySeconds: 0 # startup probe gates this; 0 is safe here
periodSeconds: 15
timeoutSeconds: 3
failureThreshold: 3 # 3 * 15s = 45 s before restart
# ── Readiness probe — gate traffic until DB conn pool is warm ─────
readinessProbe:
httpGet:
path: /healthz/ready
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3 # remove from Service endpoints after 15 s
# Spread pods across failure domains (requires Kubernetes 1.19+)
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: order-api
The most common production incident caused by this manifest: setting maxUnavailable: 1 with only replicas: 2 means a rolling update can briefly leave you with one pod. If that pod is still starting up, all traffic hits a single unready instance. Always set maxUnavailable: 0 on low-replica deployments; the cost is one extra pod slot during rollout.
Step 3 — Service and HPA
# ClusterIP Service — stable DNS for in-cluster consumers
apiVersion: v1
kind: Service
metadata:
name: order-api-svc
namespace: orders
spec:
selector:
app: order-api
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-api-hpa
namespace: orders
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before shrinking
policies:
- type: Percent
value: 25
periodSeconds: 60 # remove at most 25% per minute
scaleUp:
stabilizationWindowSeconds: 0 # scale up immediately
policies:
- type: Pods
value: 4
periodSeconds: 60 # add at most 4 pods per minute
The scaleDown.stabilizationWindowSeconds: 300 is critical. Without it the HPA can rapidly scale from 20 pods back down to 2 the moment a traffic spike subsides — only for the next spike to find a cold cluster with insufficient capacity. Google SREs call this oscillation, and it kills p99 latency. The 5-minute window smooths it out.
Step 4 — Deploy and Verify
# Apply everything in one shot (manifest files in ./manifests/)
kubectl apply -f manifests/ -n orders
# Watch the rollout — both pods must show 2/2 READY
kubectl rollout status deployment/order-api -n orders
kubectl get pods -n orders -w
# Confirm env from ConfigMap and Secret reached the container
kubectl exec -n orders deploy/order-api -- env | grep -E 'APP_ENV|DB_HOST|DB_USER'
# Check probe status (look at Events section for any probe failures)
kubectl describe pod -n orders -l app=order-api
# Inspect HPA — TARGETS column shows current vs desired utilization
kubectl get hpa -n orders -w
# Simulate load to trigger HPA scale-up
kubectl run load-gen --image=busybox -n orders --rm -it --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://order-api-svc/api/orders; done"
# After the test, watch HPA scale back down (takes ~5 min due to stabilization window)
kubectl get hpa order-api-hpa -n orders -w
Architecture Overview
All workload primitives wired together: HPA drives replica count, ConfigMap and Secret inject config, probes guard traffic, and the Service provides a stable endpoint.
Production Failure Modes to Know
Senior engineers are distinguished by knowing what breaks, not just what works. Here are the failure modes that bite teams most often with this exact setup:
Probe endpoint is too expensive. If /healthz/ready executes a database query on every call and Kubernetes polls every 5 seconds across 50 pods, that is 600 DB round-trips per minute from health checks alone. Keep probes lightweight — check an in-memory flag that your app sets after the DB connection pool warms up, not the DB itself.
Memory limit too close to request. A limit of 512 Mi with a request of 256 Mi means the node may schedule the pod, the JVM grows past 256 Mi under real load, and the OOM killer terminates the container. Set the limit at least 1.5–2x the request, or use Guaranteed QoS (request == limit) for latency-critical services.
HPA cannot scale because metrics-server is missing. Run kubectl top pods; if it errors, metrics-server is not installed. HPA silently fails to scale. Install it before you need it: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml.
Rolling update stalls at 50% because readiness probe uses a secret that was rotated. Old pods continue serving. New pods fail readiness because the Secret the Deployment references was updated but the pod cache is stale. Solution: restart the Deployment (kubectl rollout restart deployment/order-api -n orders) after rotating Secrets.
The combination of maxUnavailable: 0, topologySpreadConstraints across zones, a non-zero HPA minReplicas, and a startup probe with a generous failureThreshold is the canonical zero-downtime deployment pattern at big-tech scale. Each guard is cheap to add and expensive to retrofit after an incident.
What to Carry Forward
This workload is intentionally stateless. The patterns here — envFrom ConfigMap, individual Secret env vars, three-tier probes, request/limit ratio, HPA with asymmetric scale behavior, and topology spread — apply to almost every Kubernetes microservice you will write in your career. The next natural extension is adding a PodDisruptionBudget (minAvailable: 1) so cluster drain operations during node maintenance cannot take down all replicas simultaneously. That, plus a network policy restricting ingress to only the Ingress controller, completes a production-ready perimeter.