Infrastructure Evolution: Servers to Serverless
Infrastructure Evolution: Servers to Serverless
Every deployment decision you make as a DevOps engineer is shaped by the infrastructure model your team has chosen. Over the last 30 years the industry has moved through four distinct eras — bare metal, virtual machines, containers, and serverless — each solving real problems introduced by the previous model. This lesson traces that journey with the honesty of someone who has operated each layer in production.
Era 1 — Bare Metal
A bare-metal server is a physical machine with one operating system installed directly on hardware. There is no virtualization layer. You own every CPU cycle, every byte of RAM, and every quirk of that specific machine.
For the right workloads — HPC clusters, low-latency financial trading engines, databases that push hardware limits — bare metal is still the correct choice in 2025. But for general application hosting it has three fatal flaws:
- Slow provisioning: racking, cabling, BIOS config, OS install — days to weeks.
- Noisy-neighbor in reverse: if a runaway process eats all RAM, every app on the box dies.
- Poor utilization: industry-wide average CPU utilization on bare metal is under 20 %.
Era 2 — Virtual Machines
VMware ESXi (2001), Xen (2003), KVM (2007) — hypervisors that let a single physical host run many fully isolated operating systems simultaneously. AWS EC2 (2006) turned this into a utility: pay per hour, provision in minutes. This was revolutionary.
A VM bundles a full OS kernel, kernel drivers, system libraries, and your application into a single image (AMI on AWS, VMDK for VMware). That completeness is also the cost: a vanilla Ubuntu image is ~1 GB. Boot time is measured in minutes. At Netflix scale, waiting three minutes for a new VM to serve traffic during a traffic spike is unacceptable.
Era 3 — Containers
Docker (2013) made Linux kernel namespaces and cgroups (both existed since 2008) accessible to every developer. The insight: applications do not need a separate kernel, they just need isolation. Containers share the host kernel but see their own filesystem, PID space, network interface, and resource limits.
The practical result: a container image is 10–100× smaller than a VM image, starts in under a second, and you can pack 10× more workloads onto the same hardware. A Dockerfile makes the runtime environment reproducible — no more "works on my machine."
Dockerfile — production-grade multi-stage build (Node.js API)
# Stage 1: build dependencies FROM node:20-alpine AS deps WORKDIR /app COPY package*.json ./ RUN npm ci --omit=dev # Stage 2: build application FROM node:20-alpine AS builder WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build # Stage 3: minimal runtime image FROM node:20-alpine AS runner WORKDIR /app ENV NODE_ENV=production # Non-root user — critical for container security RUN addgroup -S appgroup && adduser -S appuser -G appgroup COPY --from=builder /app/dist ./dist COPY --from=builder /app/node_modules ./node_modules USER appuser EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=5s --start-period=10s CMD wget -qO- http://localhost:3000/health || exit 1 CMD ["node", "dist/server.js"]Era 4 — Container Orchestration
Running one container on one machine is simple. Running 10,000 containers across 500 nodes — self-healing when nodes die, routing traffic to healthy replicas, rolling updates with zero downtime, secrets injection — requires an orchestrator. Kubernetes (K8s), released by Google in 2014 based on its internal Borg system, became the industry standard.
The Kubernetes object model maps directly onto production requirements:
- Pod — the smallest deployable unit; 1–N containers that share a network namespace and storage volumes.
- Deployment — declares desired state (e.g. "5 replicas of this container image"); K8s reconciles reality to match.
- Service — a stable virtual IP + DNS name in front of a dynamic set of Pods (load balancing, service discovery).
- HorizontalPodAutoscaler (HPA) — scales replica count based on CPU/memory/custom metrics.
- Namespace — logical cluster partition; multi-team environments use one namespace per team/environment.
Kubernetes Deployment + HPA (production pattern)
apiVersion: apps/v1 kind: Deployment metadata: name: api-service namespace: production spec: replicas: 3 selector: matchLabels: app: api-service strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # zero-downtime: never kill old before new is ready maxSurge: 1 template: metadata: labels: app: api-service spec: containers: - name: api image: registry.example.com/api-service:v2.4.1 # always pin exact tag, never :latest ports: - containerPort: 3000 resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" readinessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 15 periodSeconds: 20 --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-service-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-service minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60:latest image tag. If you deploy image: my-app:latest and a node is replaced, Kubernetes pulls the newest image on that node while other nodes still run the old one. You get a mixed-version cluster with no way to roll back. Always pin to an immutable tag like a git commit SHA (image: my-app:a3f1d9c) or a semantic version (image: my-app:v2.4.1). Your CI/CD pipeline should build and push the tag, then update the manifest.Era 5 — Serverless & Functions as a Service
AWS Lambda (2014) introduced a new abstraction: upload a function, define a trigger, pay per 100 ms of execution. No servers to provision, no containers to manage, no idle capacity to pay for. The platform handles scaling from zero to thousands of concurrent executions automatically.
Serverless shines for event-driven, spiky, or infrequent workloads: image processing triggered by S3 uploads, API endpoints behind API Gateway, scheduled jobs, stream processing (Kinesis/SQS consumers). It is the wrong choice for long-running, stateful, or latency-sensitive workloads where cold-start time (50–500 ms for a Node.js function, 1–10 s for a JVM function) is unacceptable.
AWS Lambda + API Gateway (Terraform IaC)
resource "aws_lambda_function" "image_resizer" { function_name = "image-resizer" filename = "dist/image-resizer.zip" source_code_hash = filebase64sha256("dist/image-resizer.zip") handler = "index.handler" runtime = "nodejs20.x" timeout = 30 memory_size = 512 environment { variables = { BUCKET_OUT = aws_s3_bucket.processed.bucket } } # VPC placement adds ~600 ms cold start — only use when you need VPC resources # vpc_config { ... } } resource "aws_lambda_permission" "s3_invoke" { statement_id = "AllowS3Invoke" action = "lambda:InvokeFunction" function_name = aws_lambda_function.image_resizer.function_name principal = "s3.amazonaws.com" source_arn = aws_s3_bucket.uploads.arn } resource "aws_s3_bucket_notification" "trigger" { bucket = aws_s3_bucket.uploads.id lambda_function { lambda_function_arn = aws_lambda_function.image_resizer.arn events = ["s3:ObjectCreated:*"] filter_suffix = ".jpg" } depends_on = [aws_lambda_permission.s3_invoke] }Choosing the Right Abstraction
Real teams do not pick one model and use it everywhere. Netflix runs most services on containers (Kubernetes via Titus), some batch workloads on VMs, and event-driven pipelines on Lambda. The decision tree comes down to:
- Need maximum hardware performance? Bare metal or dedicated hosts (EC2 dedicated).
- Need VM-level isolation or a legacy OS? VMs (EC2, GCE, Azure VM).
- Running microservices or APIs with predictable traffic? Containers on Kubernetes (EKS, GKE, AKS).
- Event-driven, spiky, or infrequent workloads? Serverless (Lambda, Cloud Functions, Azure Functions).