DevOps Culture & Fundamentals

Infrastructure Evolution: Servers to Serverless

18 min Lesson 6 of 28

Infrastructure Evolution: Servers to Serverless

Every deployment decision you make as a DevOps engineer is shaped by the infrastructure model your team has chosen. Over the last 30 years the industry has moved through four distinct eras — bare metal, virtual machines, containers, and serverless — each solving real problems introduced by the previous model. This lesson traces that journey with the honesty of someone who has operated each layer in production.

The four eras of infrastructure — each layer builds on lessons from the previous one.

Era 1 — Bare Metal

A bare-metal server is a physical machine with one operating system installed directly on hardware. There is no virtualization layer. You own every CPU cycle, every byte of RAM, and every quirk of that specific machine.

For the right workloads — HPC clusters, low-latency financial trading engines, databases that push hardware limits — bare metal is still the correct choice in 2025. But for general application hosting it has three fatal flaws:

Slow provisioning: racking, cabling, BIOS config, OS install — days to weeks.
Noisy-neighbor in reverse: if a runaway process eats all RAM, every app on the box dies.
Poor utilization: industry-wide average CPU utilization on bare metal is under 20 %.

Era 2 — Virtual Machines

VMware ESXi (2001), Xen (2003), KVM (2007) — hypervisors that let a single physical host run many fully isolated operating systems simultaneously. AWS EC2 (2006) turned this into a utility: pay per hour, provision in minutes. This was revolutionary.

A VM bundles a full OS kernel, kernel drivers, system libraries, and your application into a single image (AMI on AWS, VMDK for VMware). That completeness is also the cost: a vanilla Ubuntu image is ~1 GB. Boot time is measured in minutes. At Netflix scale, waiting three minutes for a new VM to serve traffic during a traffic spike is unacceptable.

Key concept — hypervisor types: Type 1 (bare-metal hypervisor) runs directly on hardware — VMware ESXi, Xen, Hyper-V. Type 2 runs on a host OS — VirtualBox, VMware Workstation. Public cloud uses Type 1. EC2 moved to its own Nitro hypervisor in 2017, implemented mostly in purpose-built hardware ASICs for near-zero overhead.

Era 3 — Containers

Docker (2013) made Linux kernel namespaces and cgroups (both existed since 2008) accessible to every developer. The insight: applications do not need a separate kernel, they just need isolation. Containers share the host kernel but see their own filesystem, PID space, network interface, and resource limits.

The practical result: a container image is 10–100× smaller than a VM image, starts in under a second, and you can pack 10× more workloads onto the same hardware. A Dockerfile makes the runtime environment reproducible — no more "works on my machine."

Dockerfile — production-grade multi-stage build (Node.js API)
# Stage 1: build dependencies
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev

# Stage 2: build application
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# Stage 3: minimal runtime image
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
# Non-root user — critical for container security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]

Multi-stage builds are non-negotiable in production. They keep secrets (npm tokens, build-time credentials) and dev tools (compilers, test runners) out of the final image, which shrinks attack surface and image size. A typical React app: naive single-stage = 1.2 GB, multi-stage = 42 MB.

Era 4 — Container Orchestration

Running one container on one machine is simple. Running 10,000 containers across 500 nodes — self-healing when nodes die, routing traffic to healthy replicas, rolling updates with zero downtime, secrets injection — requires an orchestrator. Kubernetes (K8s), released by Google in 2014 based on its internal Borg system, became the industry standard.

The Kubernetes object model maps directly onto production requirements:

Pod — the smallest deployable unit; 1–N containers that share a network namespace and storage volumes.
Deployment — declares desired state (e.g. "5 replicas of this container image"); K8s reconciles reality to match.
Service — a stable virtual IP + DNS name in front of a dynamic set of Pods (load balancing, service discovery).
HorizontalPodAutoscaler (HPA) — scales replica count based on CPU/memory/custom metrics.
Namespace — logical cluster partition; multi-team environments use one namespace per team/environment.

Kubernetes Deployment + HPA (production pattern)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-service
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0      # zero-downtime: never kill old before new is ready
      maxSurge: 1
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
        - name: api
          image: registry.example.com/api-service:v2.4.1   # always pin exact tag, never :latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Kubernetes cluster: the Control Plane reconciles desired state; kubelets on Worker Nodes run Pods; Services provide stable virtual IPs for routing.

Production pitfall — never use :latest image tag. If you deploy image: my-app:latest and a node is replaced, Kubernetes pulls the newest image on that node while other nodes still run the old one. You get a mixed-version cluster with no way to roll back. Always pin to an immutable tag like a git commit SHA (image: my-app:a3f1d9c) or a semantic version (image: my-app:v2.4.1). Your CI/CD pipeline should build and push the tag, then update the manifest.

Era 5 — Serverless & Functions as a Service

AWS Lambda (2014) introduced a new abstraction: upload a function, define a trigger, pay per 100 ms of execution. No servers to provision, no containers to manage, no idle capacity to pay for. The platform handles scaling from zero to thousands of concurrent executions automatically.

Serverless shines for event-driven, spiky, or infrequent workloads: image processing triggered by S3 uploads, API endpoints behind API Gateway, scheduled jobs, stream processing (Kinesis/SQS consumers). It is the wrong choice for long-running, stateful, or latency-sensitive workloads where cold-start time (50–500 ms for a Node.js function, 1–10 s for a JVM function) is unacceptable.

AWS Lambda + API Gateway (Terraform IaC)
resource "aws_lambda_function" "image_resizer" {
  function_name = "image-resizer"
  filename      = "dist/image-resizer.zip"
  source_code_hash = filebase64sha256("dist/image-resizer.zip")
  handler       = "index.handler"
  runtime       = "nodejs20.x"
  timeout       = 30
  memory_size   = 512

  environment {
    variables = {
      BUCKET_OUT = aws_s3_bucket.processed.bucket
    }
  }

  # VPC placement adds ~600 ms cold start — only use when you need VPC resources
  # vpc_config { ... }
}

resource "aws_lambda_permission" "s3_invoke" {
  statement_id  = "AllowS3Invoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.image_resizer.function_name
  principal     = "s3.amazonaws.com"
  source_arn    = aws_s3_bucket.uploads.arn
}

resource "aws_s3_bucket_notification" "trigger" {
  bucket = aws_s3_bucket.uploads.id
  lambda_function {
    lambda_function_arn = aws_lambda_function.image_resizer.arn
    events              = ["s3:ObjectCreated:*"]
    filter_suffix       = ".jpg"
  }
  depends_on = [aws_lambda_permission.s3_invoke]
}

Choosing the Right Abstraction

Real teams do not pick one model and use it everywhere. Netflix runs most services on containers (Kubernetes via Titus), some batch workloads on VMs, and event-driven pipelines on Lambda. The decision tree comes down to:

Need maximum hardware performance? Bare metal or dedicated hosts (EC2 dedicated).
Need VM-level isolation or a legacy OS? VMs (EC2, GCE, Azure VM).
Running microservices or APIs with predictable traffic? Containers on Kubernetes (EKS, GKE, AKS).
Event-driven, spiky, or infrequent workloads? Serverless (Lambda, Cloud Functions, Azure Functions).

Infrastructure as Code (IaC) is the foundation of all of this. Whether your infra is a bare-metal BIOS config script, a Terraform plan provisioning EC2 instances, a Helm chart for Kubernetes, or a SAM template for Lambda — treating infrastructure as versioned, reviewed, tested code is what makes DevOps work at scale. Lessons later in this course cover Terraform and Helm in depth; everything you learned here is the context those tools operate in.