Artifact Management & Release Engineering

Reproducible & Hermetic Builds

18 min Lesson 5 of 28

Reproducible & Hermetic Builds

At big-tech scale, "it works on my machine" is not a defense — it is an incident report waiting to happen. A reproducible build is one where the same source inputs always produce bit-for-bit (or semantically equivalent) outputs. A hermetic build goes further: the build process is sealed from the outside world — no downloading from the internet at build time, no ambient environment variables, no host filesystem leakage. Together these two properties are the foundation of secure, auditable software supply chains.

Google, Meta, and Uber all run hermetic build systems (Bazel, Buck) precisely because at their scale a non-deterministic build is a security and reliability liability. The SLSA framework (Supply-chain Levels for Software Artifacts) formalises these requirements into leveled compliance tiers that regulators and enterprise customers now demand.

Why Reproducibility Is Hard

Most build systems are not reproducible by default. Common sources of non-determinism include:

  • Timestamps — file modification times, __DATE__/__TIME__ macros embedded in binaries.
  • Filesystem ordering — directory listings are not sorted on most filesystems; build tools that iterate them produce different archive member orders.
  • Floating-point and CPU variation — compiler auto-vectorisation can produce different machine code across CPU generations.
  • Random UUIDs or salts embedded into build outputs (some bundlers do this).
  • Unpinned dependencies — a pip install requests today may fetch 2.31.0; tomorrow it fetches 2.32.0.
  • Network fetches at build time — resolving DNS at build time introduces remote-state dependency.

Dependency Pinning and Lockfiles

The first and most impactful control is pinning every dependency to an exact version via a lockfile. Every major ecosystem has a lockfile mechanism. Commit the lockfile to git and treat divergence as a failing CI check.

# Node.js — npm npm ci # installs ONLY from package-lock.json; fails if lock is out of date # Never use npm install in CI — it can silently update the lock # Python — pip with pip-compile (pip-tools) pip-compile requirements.in # deterministic resolution, writes requirements.txt with hashes pip install --require-hashes -r requirements.txt # verifies every wheel hash at install # Go — go.sum is the hash manifest; go.mod pins module@version go mod download # fetches into module cache; GONOSUMCHECK should be empty in CI # Rust — Cargo.lock is exact; always commit it for binaries (not libraries) cargo build --locked # fails if Cargo.lock is stale # Java/Maven — use the dependency:resolve plugin with checksums mvn dependency:resolve -Dclassifier=sources -Dmdep.useRepositoryLayout=true
Do not .gitignore your lockfile. Many scaffolders add package-lock.json or Pipfile.lock to .gitignore by default. This is correct for reusable libraries (where you want to test across a range of dep versions) but wrong for deployed applications. Every application that runs in production must have its exact dependency graph locked and committed.

Hash-Verified Downloads

A version pin is not enough — a package registry can be compromised, or a version tag can be moved (mutable tags are a real attack vector). The fix is to pin content hashes (SHA-256) alongside version strings.

# pip: requirements.txt with hashes (generated by pip-compile --generate-hashes) requests==2.31.0 \ --hash=sha256:58cd2187423d21a6be1c2d4c12... \ --hash=sha256:c5b6e7a24c34e6fa7d9c29b... \ # Docker: always pull by digest in production Dockerfiles FROM python:3.12.3-slim@sha256:9f7be3b0d1e0b9b2f5c49... AS base # Terraform provider pinning with SHA in .terraform.lock.hcl (auto-managed; commit it) provider "registry.terraform.io/hashicorp/aws" { version = "5.50.0" constraints = "~> 5.0" hashes = [ "h1:xyz123...", "zh:abc456...", ] }
Renovate Bot / Dependabot can automatically open PRs when new versions are available — including updated hashes. This gives you both freshness (security patches) and determinism (locked hashes). Configure it to run weekly against non-breaking semver ranges and auto-merge after CI passes.

Build Metadata and Provenance

Reproducibility answers "can I rebuild this?" Provenance answers "who built this, from what source, on what machine, at what time?" Both are required for a mature supply chain.

Build metadata is structured information stamped into artifacts at build time:

  • git.commit — the exact commit SHA that produced this build.
  • git.branch / git.tag — branch or semver tag.
  • build.timestamp — RFC-3339 UTC timestamp (store as metadata, not embedded in the binary, to preserve reproducibility).
  • build.runner — CI system, runner ID, pipeline URL.
  • build.builder_image — the exact Docker image digest used to compile.
# Stamping Go binaries via ldflags (metadata in binary, not affecting reproducibility if timestamp excluded) GIT_COMMIT=$(git rev-parse --short HEAD) GIT_TAG=$(git describe --tags --always --dirty) BUILD_DATE=$(date -u +%Y-%m-%dT%H:%M:%SZ) go build -ldflags "\ -X main.Version=${GIT_TAG} \ -X main.GitCommit=${GIT_COMMIT} \ -X main.BuildDate=${BUILD_DATE}" \ -o bin/myapp ./cmd/myapp # For fully reproducible output, omit BuildDate from ldflags # and instead store it as OCI image label: docker build \ --label org.opencontainers.image.revision="${GIT_COMMIT}" \ --label org.opencontainers.image.version="${GIT_TAG}" \ --label org.opencontainers.image.created="${BUILD_DATE}" \ --label org.opencontainers.image.source="https://github.com/myorg/myapp" \ -t myapp:${GIT_TAG} .

SLSA Provenance Attestations

SLSA (Supply-chain Levels for Software Artifacts, pronounced "salsa") defines four trust levels. SLSA 2 — achievable by any team with a modern CI system — requires a signed provenance attestation: a machine-verifiable document stating what source produced this artifact, on what build platform, with what inputs.

GitHub Actions ships a first-class SLSA 3 generator. Consumers verify the attestation before installing.

# .github/workflows/release.yml — SLSA 3 provenance for a Go binary jobs: build: permissions: id-token: write # needed to sign the provenance contents: read uses: slsa-framework/slsa-github-generator/.github/workflows/builder_go_slsa3.yml@v2.0.0 with: go-version: "1.22" # Verify a container image attestation (cosign from Sigstore) cosign verify-attestation \ --type slsaprovenance \ --certificate-identity-regexp "^https://github.com/myorg/myapp" \ --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \ myregistry.io/myapp:v1.4.2
Hermetic build pipeline with provenance attestation Git Commit SHA pinned HERMETIC BUILD SANDBOX Locked dependencies (hash-verified) Pinned builder image (digest SHA) No network egress Artifact OCI image / binary SLSA Provenance JSON (Sigstore signed) Consumer cosign verify attests
A hermetic build sandbox produces an artifact and a signed SLSA provenance document; consumers verify both before deploying.

Hermetic Builds in Practice

True hermeticity means the build sandbox has no outbound network access and uses only pre-fetched, hash-verified inputs. Bazel enforces this via sandboxing. For teams not on Bazel, approximate it:

  • Pre-fetch all dependencies into a registry mirror or vendor directory before the build step begins.
  • Use --network=none in Docker build stages that compile code (a separate fetch stage downloads deps).
  • Run builds inside a pinned builder image (e.g., golang:1.22.3@sha256:...) so the compiler version is fixed.
  • Set SOURCE_DATE_EPOCH to the git commit timestamp to eliminate timestamp non-determinism in archive tools.
# Multi-stage Dockerfile: fetch stage has network; compile stage does not # Stage 1 — fetch (network allowed) FROM golang:1.22.3-alpine@sha256:b4f5d3... AS fetch WORKDIR /app COPY go.mod go.sum ./ RUN go mod download -x # all deps fetched into module cache # Stage 2 — compile (no network needed; uses cached modules) FROM fetch AS build COPY . . ENV CGO_ENABLED=0 \ SOURCE_DATE_EPOCH=0 # set to git epoch for reproducibility RUN --network=none go build -trimpath -ldflags "-s -w" -o /bin/app ./cmd/app # Stage 3 — minimal runtime image FROM gcr.io/distroless/static-debian12@sha256:a7f4... AS runtime COPY --from=build /bin/app /app ENTRYPOINT ["/app"]
The -trimpath flag (Go) strips absolute host filesystem paths from the binary, eliminating a major source of non-determinism between developers on different machines. Most languages have an equivalent — Rust uses --remap-path-prefix; Python wheel builds accept SOURCE_DATE_EPOCH; npm packages set reproducible: true in some bundlers.

Verifying Reproducibility

To confirm a build is truly reproducible, rebuild from the same inputs on a different machine and compare artifact hashes. The Reproducible Builds project publishes tools for this:

# Rebuild and compare Docker image digests docker build --no-cache -t myapp:v1.4.2-rebuild . docker inspect --format='{{.Id}}' myapp:v1.4.2 myapp:v1.4.2-rebuild # Both IDs must match # diffoscope: deep diff of two binaries to find non-determinism sources diffoscope ./bin/app-build1 ./bin/app-build2 # For OCI images: compare layer digests crane digest myregistry.io/myapp:v1.4.2 # Store this digest in your release notes and SBOM

Reproducible and hermetic builds are not optional niceties at production scale — they are the foundation on which software supply chain security, binary transparency, and efficient caching are built. SLSA compliance increasingly appears in enterprise procurement requirements and government security mandates (NIST SP 800-218, EO 14028). Teams that invest in these practices early avoid painful retrofits when auditors arrive.