Advanced Docker & Container Security

Project: A Hardened Production Image

25 min Lesson 10 of 28

Project: A Hardened Production Image

The previous nine lessons each targeted one dimension of container security in isolation. This capstone project synthesises all of them into a single, end-to-end workflow: take a realistic Node.js API, and drive it through multi-stage minimisation, image scanning, supply-chain signing, and non-root hardening. The result is a production image you could defend in a security review at any top-tier company.

We will work with a concrete application structure and apply every hardening step in order, showing the exact commands, the measurable improvement after each step, and the integration points for a CI/CD pipeline.

Starting Point: The Naive Image

Most teams start here — a single-stage Dockerfile based on the official full Node image, running as root:

# BEFORE — naive, insecure baseline
FROM node:20

WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "src/index.js"]

A quick assessment of this image reveals three immediate problems: it is ~1.1 GB, it runs as root (UID 0), and it includes every development dependency plus the full npm toolchain. Grype finds around 300 CVEs in the base image alone.

Step 1 — Multi-Stage Build to Minimise Attack Surface

The first transformation separates build-time tooling from runtime assets. Production node_modules only includes packages listed under dependencies in package.json, not devDependencies.

# syntax=docker/dockerfile:1.7
# ── Stage 0: install all deps + build (TypeScript compile / asset bundling) ──
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json tsconfig.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci
COPY src ./src
RUN npm run build          # outputs compiled JS to /app/dist

# ── Stage 1: install production deps only ──────────────────────────────────
FROM node:20-alpine AS prod-deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci --omit=dev --ignore-scripts

# ── Stage 2: minimal runtime ────────────────────────────────────────────────
FROM node:20-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY --from=prod-deps /app/node_modules ./node_modules
COPY --from=builder   /app/dist         ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]

Image size drops from ~1.1 GB to ~180 MB. CVE count drops by roughly 70 % because the build toolchain is gone. But we are still running as root — move to the next step.

Step 2 — Non-Root User & Read-Only Filesystem

The node:20-alpine image ships with a pre-created node user (UID 1000). All we need to do is own the files and switch before the entrypoint:

# Stage 2 continued — add user and filesystem hardening
FROM node:20-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production

COPY --from=prod-deps /app/node_modules ./node_modules
COPY --from=builder   /app/dist         ./dist

# Change ownership to the non-root node user
RUN chown -R node:node /app

USER node

# Declare a tmpdir so the process can write transient files if needed
VOLUME ["/tmp"]

EXPOSE 3000
CMD ["node", "dist/index.js"]

When deploying to Kubernetes, pair this with a SecurityContext that locks down the pod further:

# kubernetes/deployment.yaml (security context block)
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop:
      - ALL

Why readOnlyRootFilesystem: true matters. A container with a writable root can be exploited to overwrite its own binary, install a backdoor, or write a cron job. Read-only root forces all writes to explicit emptyDir or persistent volume mounts, which ops teams can audit. An application that cannot handle a read-only root has a hidden assumption worth fixing at the code level.

Step 3 — Scan Before You Push

Integrate Grype into your CI pipeline so a build with critical CVEs fails before the image ever reaches the registry:

# Build the image
docker build --tag myapi:$GIT_SHA .

# Scan — exit non-zero if any CVE is severity CRITICAL or HIGH
grype myapi:$GIT_SHA \
  --fail-on critical \
  --output table

# Optionally produce a machine-readable SBOM first, then scan the SBOM
# (avoids re-pulling from the daemon for downstream auditing)
syft myapi:$GIT_SHA -o spdx-json > sbom.spdx.json
grype sbom:sbom.spdx.json --fail-on critical

In GitHub Actions this looks like:

# .github/workflows/build.yml (scan step)
- name: Scan image for vulnerabilities
  uses: anchore/scan-action@v3
  with:
    image: "myapi:${{ github.sha }}"
    fail-build: true
    severity-cutoff: critical
    output-format: table

Never scan only at build time and consider it done. CVEs are disclosed continuously. An image that passes scanning today may be critical by next week. Run scheduled nightly re-scans against all images currently running in production using your registry's built-in scanning (ECR Inspector, GCR Artifact Analysis, Harbor Trivy adapter) and alert your security channel when the severity threshold is breached.

Step 4 — Sign the Image with Cosign (Sigstore)

Signing closes the supply-chain gap between CI and production. Without signing, nothing prevents someone from pushing a different image under the same tag to your registry. With Cosign keyless signing (backed by Sigstore's transparency log), every image carries a cryptographic receipt that ties it to the specific GitHub Actions workflow run that produced it.

# Install cosign (via brew or curl the binary)
# brew install cosign

# In CI — keyless signing (OIDC token from GitHub Actions)
# cosign sign uses the ACTIONS_ID_TOKEN_REQUEST_URL env var automatically

# Push the image first
docker push ghcr.io/org/myapi:$GIT_SHA

# Sign (keyless — no key management required, uses Fulcio + Rekor)
cosign sign --yes ghcr.io/org/myapi:$GIT_SHA

# Verify on any machine (substitute the expected OIDC issuer + subject)
cosign verify \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  --certificate-identity-regexp "https://github.com/org/myapi/.github/workflows/build.yml@refs/heads/main" \
  ghcr.io/org/myapi:$GIT_SHA | jq .

In Kubernetes, enforce this at admission with Policy Controller or Kyverno, so unsigned images are rejected before they ever reach a node:

# kyverno/verify-image-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-signed-images
spec:
  validationFailureAction: Enforce
  rules:
    - name: verify-image-signature
      match:
        any:
          - resources:
              kinds: [Pod]
      verifyImages:
        - imageReferences:
            - "ghcr.io/org/myapi:*"
          attestors:
            - entries:
                - keyless:
                    subject: "https://github.com/org/myapi/.github/workflows/build.yml@refs/heads/main"
                    issuer: "https://token.actions.githubusercontent.com"

The Complete Hardening Pipeline

The diagram below shows all four stages wired together in a single CI/CD flow, from a developer push to a verified, policy-compliant image running in production.

The complete hardened image pipeline: every stage gates the next, so only minimal, scanned, signed, and non-root images reach production.

Verification Checklist

After building your hardened image, run through this checklist before tagging it as production-ready. At companies like Google and Netflix, this checklist is enforced automatically in CI — failing any check blocks the deployment.

# 1. Confirm non-root
docker run --rm myapi:$GIT_SHA id
# Expected: uid=1000(node) gid=1000(node) — NOT uid=0(root)

# 2. Check image size
docker image inspect myapi:$GIT_SHA --format '{{.Size}}' | numfmt --to=iec
# Target: <200 MB for Node.js services, <30 MB for Go/distroless

# 3. No shell in the runtime image
docker run --rm --entrypoint sh myapi:$GIT_SHA -c "echo hi" 2>&1 || echo "No shell — good"

# 4. List processes running as root inside the container
docker run --rm myapi:$GIT_SHA ps aux | grep root
# Expected: empty (only your app process, running as UID 1000)

# 5. SBOM exists
syft myapi:$GIT_SHA -o table | head -20

# 6. Signature is valid
cosign verify \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  --certificate-identity-regexp "https://github.com/org/myapi/.github/workflows/.*" \
  ghcr.io/org/myapi:$GIT_SHA

# 7. Scan final image
grype myapi:$GIT_SHA --fail-on high

Pin base image by digest, not by tag. Tags are mutable — node:20-alpine can change without warning when the upstream maintainer pushes a patch. Pin to the immutable digest in your Dockerfile and update it deliberately via Dependabot or Renovate:

FROM node:20-alpine@sha256:a7f5...

This also makes cosign verification stronger: the digest is part of the signed payload, so any tampering with the base layer breaks the signature.

Production Failure Modes to Know

Even well-hardened images surface runtime surprises. These are the most common failures teams hit after switching to this setup:

App writes to /app at runtime. Many frameworks write lock files, compiled templates, or upload staging directories under the working directory. With readOnlyRootFilesystem: true, these writes panic. Fix: mount an emptyDir at the specific writable path and point the framework config to it, not the working directory.
Missing CA certificates. Alpine ships ca-certificates but if you start from scratch or a stripped distroless variant, TLS connections to external services fail with certificate signed by unknown authority. Fix: COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/.
Port binding below 1024 as non-root. Ports 80 and 443 require CAP_NET_BIND_SERVICE. Running as UID 1000 without that capability and trying to bind port 80 gives permission denied. Fix: bind on port 3000 (or any unprivileged port) and let the Kubernetes Service or load balancer handle port 80/443 externally.
Signal handling. When PID 1 is node (not an init process), SIGTERM from kubectl rollout restart may not propagate correctly to child processes, leaving zombie processes. Fix: use ["dumb-init", "node", "dist/index.js"] — add dumb-init from Alpine and set it as the entrypoint wrapper.

Security hardening is not a one-time event. A hardened image built today can be un-hardened by next sprint if a developer adds a USER root to fix a permissions issue or upgrades a base image without re-scanning. Enforce these controls in CI, in admission webhooks, and in your runtime security tool (Falco, Sysdig). Treat a running container that spawns a shell or writes to unexpected paths as an incident, not a misconfiguration.

You now have a repeatable, auditable path from source code to a production image that passes a security review at big-tech standards: minimal (180 MB vs 1.1 GB), non-root (UID 1000), scanned (zero critical CVEs), signed (cosign + Rekor), and policy-enforced (Kyverno admission). Run this workflow on every merge to main and you have a continuous, measurable security posture — not a point-in-time audit.