Docker & Containerization

Image Size & Build Hygiene

18 min Lesson 9 of 30

Image Size & Build Hygiene

A bloated Docker image is not just an aesthetic problem — it directly impacts pull latency on cold deploys, raises vulnerability surface area, inflates container registry storage costs, and slows every CI pipeline run. At Google, Amazon, and Netflix scale, shaving 200 MB off a base image translates to thousands of dollars of saved bandwidth and measurably faster canary rollouts. This lesson covers the four pillars of production-grade image hygiene: choosing the right base image, excluding unnecessary files with .dockerignore, understanding multi-stage builds as a size strategy, and linting Dockerfiles to catch mistakes before they reach production.

Choosing the Right Base Image

The single highest-leverage decision for image size is the base image. The same application running on node:20 versus node:20-alpine versus node:20-slim can differ by 400 MB — that is 400 MB of binaries your users never execute but your registry stores and your cluster pulls.

-alpine variants are built on Alpine Linux (~5 MB) and use musl libc instead of glibc. They are the default choice for statically compiled languages (Go, Rust) and most Node.js or Python workloads. The trade-off: some native extensions (particularly those that link against glibc directly) fail to build — profile this before committing to Alpine in production.
-slim variants use full Debian but with the majority of non-essential packages removed. They are the safer fallback when Alpine breaks native dependencies — larger than Alpine but fully glibc-compatible.
-distroless images (from Google) contain only the application runtime and its direct OS dependencies — no shell, no package manager, no utilities. An attacker who achieves code execution inside a distroless container cannot run bash, curl, or apt. Used extensively at Google and increasingly by security-conscious teams everywhere.
Scratch is a completely empty base (zero bytes). Used for single-binary Go or Rust programs that link everything statically — the resulting image is literally just your binary.

# Image size comparison for a trivial Node.js app (2025 figures)
# node:20           ~1.1 GB
# node:20-bookworm-slim  ~240 MB
# node:20-alpine    ~65 MB
# gcr.io/distroless/nodejs20-debian12  ~115 MB

# Always pin to an exact digest in production, never float on a tag
FROM node:20-alpine@sha256:a4e5e9fa4e7e4dcf30e5e9cd36c5b6f67c208d80e7c6e93ce4d3a06e0f7d9f3 AS base

Pin base images by digest, not by tag. The tag node:20-alpine is mutable — it can be silently updated to a new image containing a regression or vulnerability. Pinning to @sha256:... guarantees you run exactly the same bytes in CI and in production. Your image-update workflow (Dependabot or Renovate) should be what bumps the digest, not a surprise on the next docker pull.

The .dockerignore File

Every file in the build context is sent to the Docker daemon before the first RUN instruction executes. On a typical Node.js or Laravel project, the default context (the whole repository) includes node_modules (hundreds of megabytes), .git (tens of megabytes of history), test fixtures, local .env secrets, IDE configuration files, and CI YAML. All of this lands in the daemon's temporary directory, slows the build, and risks leaking secrets into intermediate layers if a COPY . . instruction runs before you realize the scope.

A .dockerignore file at the project root follows the same glob syntax as .gitignore and solves this completely:

# .dockerignore  —  production-grade template for a Node.js / Next.js project

# Version control — never needed in an image
.git
.gitignore
.gitattributes

# Local environment secrets — must never enter an image layer
.env
.env.*
!.env.example

# Dependencies (rebuilt inside the image from package-lock.json)
node_modules
npm-debug.log*
yarn-error.log*

# Test and QA artefacts
__tests__
*.test.ts
*.spec.ts
coverage
.nyc_output
jest.config.*
cypress

# Build artefacts produced by the host (not needed; image builds its own)
dist
build
.next
out

# IDE and OS noise
.vscode
.idea
*.DS_Store
Thumbs.db

# CI / CD configs — not needed at runtime
.github
.gitlab-ci.yml
Jenkinsfile
Makefile
docker-compose*.yaml

# Documentation
docs
*.md
LICENSE

A missing .dockerignore can leak secrets into your image layers. If your build context includes .env files and your Dockerfile runs COPY . . early (before a RUN that deletes them), those secrets are baked into the layer and extractable by anyone who can pull the image — even if a subsequent instruction removes the file. Always create .dockerignore before you write a single COPY instruction.

Multi-Stage Builds as a Size Strategy

Multi-stage builds are the most powerful size-reduction technique available. The concept: use a fat builder image with all compiler toolchains, test runners, and dev dependencies to produce your application artifact, then copy only that artifact into a minimal runtime image. The builder stage never ships to users — it disappears after docker build completes.

The following Dockerfile follows the pattern for a Node.js application and produces a final image under 150 MB from a base that starts over 1 GB:

# syntax=docker/dockerfile:1.7
# Dockerfile for a production Node.js API

# ── Stage 1: dependency install ──────────────────────────────────────────────
FROM node:20-alpine AS deps
WORKDIR /app
# Copy manifests only — layer is cached as long as these files do not change
COPY package.json package-lock.json ./
RUN npm ci --omit=dev

# ── Stage 2: build ───────────────────────────────────────────────────────────
FROM node:20-alpine AS build
WORKDIR /app
COPY package.json package-lock.json ./
# Install ALL deps including devDependencies (TypeScript compiler, etc.)
RUN npm ci
COPY . .
RUN npm run build        # emits compiled JS into /app/dist

# ── Stage 3: production runtime ──────────────────────────────────────────────
FROM node:20-alpine AS production
WORKDIR /app

# Run as non-root (principle of least privilege)
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Copy only what the runtime needs
COPY --from=deps  /app/node_modules ./node_modules
COPY --from=build /app/dist         ./dist
COPY --from=build /app/package.json ./

USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]

What makes this pattern work at scale:

Layer cache optimization — copying package.json and package-lock.json before the rest of the source means the npm ci layer only invalidates when dependencies change, not on every source edit. This is the single most impactful Dockerfile cache optimization and the most commonly omitted one.
Only prod deps ship — npm ci --omit=dev excludes TypeScript, Jest, ESLint, and every other dev tool. For a typical project that is a 60–80% node_modules size reduction.
Non-root user — USER appuser in the final stage means a container escape does not give root access to the host. Required by CIS Docker Benchmark and most enterprise security policies.
No build tools in the runtime image — the build stage has TypeScript and webpack; the final production stage has neither. An attacker cannot weaponize a compiler they cannot reach.

Three-stage build: prod dependencies and compiled output flow into a minimal runtime image. Builder stages are discarded after the build completes.

Linting Dockerfiles with Hadolint

Hadolint is the industry-standard Dockerfile linter — a static analysis tool that checks your Dockerfile against the official best-practice ruleset and the shellcheck rules for embedded shell scripts. It runs in CI pipelines at most major tech companies as a required gate before an image is built.

# Install hadolint (macOS / Linux)
brew install hadolint                         # macOS
# or pull the container — no install needed
docker run --rm -i hadolint/hadolint < Dockerfile

# Typical CI usage (GitHub Actions step)
- name: Lint Dockerfile
  run: docker run --rm -i hadolint/hadolint hadolint \
       --failure-threshold warning \
       - < Dockerfile

# Run with a custom config to suppress specific rules
cat > .hadolint.yaml << 'EOF'
failure-threshold: warning
ignore:
  - DL3008   # apt-get install without version pinning (sometimes acceptable)
  - DL3018   # apk add without pinning (same)
EOF
hadolint Dockerfile

Common Hadolint rules worth knowing by name:

DL3006 — FROM without a specific tag (floating latest). Always use a pinned version.
DL3007 — Using latest tag explicitly. Same issue, explicit form.
DL3008 / DL3009 / DL3018 — apt-get install or apk add without pinning package versions. Breaks reproducibility on cache miss.
DL3015 — apt-get install without --no-install-recommends. Pulls in dozens of transitive packages you do not need.
DL3025 — JSON form of CMD / ENTRYPOINT not used. Shell form wraps your command in sh -c, which means signals (like SIGTERM from Kubernetes) do not reach your process directly — they go to the shell and often get swallowed, causing 30-second grace-period timeouts on every rolling deploy.
SC2086 — (from shellcheck) Unquoted variable in shell — a latent word-splitting bug.

Add Hadolint as a pre-commit hook and a CI required check. Running it locally (via pre-commit or a Makefile target) catches issues in seconds. Running it in CI as a required status check ensures no Dockerfile that fails the standard ever reaches your registry. Many teams also add docker scout cves or trivy image as a second CI gate to catch CVEs in base images after the build — layer hygiene and vulnerability scanning are complementary, not alternatives.

Additional Build Hygiene Practices

Beyond base image selection, .dockerignore, multi-stage builds, and linting, several smaller habits distinguish a production-quality Dockerfile from a demo one:

Combine RUN instructions that belong together (for example, apt-get update && apt-get install && rm -rf /var/lib/apt/lists/* in a single RUN). Each RUN creates a layer; splitting them means the update cache and the installed packages live in separate layers, and the rm -rf cleanup in a later layer does not actually reduce the image size (the bytes are still in the earlier layer).
Clean package manager caches in the same RUN — apt-get clean, rm -rf /var/lib/apt/lists/*, pip install --no-cache-dir, npm ci && npm cache clean --force. If you clean in a later layer, the cache bytes are already committed.
Declare LABEL metadata — at minimum org.opencontainers.image.source, .version, and .revision so tooling can trace an image back to its source commit and pipeline run.
Use COPY not ADD unless you specifically need ADD's URL-fetching or tar-extraction features. COPY is explicit and predictable; ADD has surprising auto-extraction behavior.
Set a HEALTHCHECK so Docker and orchestrators can detect a process that is running but not serving traffic — important for depends_on: condition: service_healthy in Compose and for liveness probes in Kubernetes.