Advanced Docker & Container Security

BuildKit & Build Performance

18 min Lesson 2 of 28

BuildKit & Build Performance

The legacy Docker build engine — the one invoked by docker build before BuildKit — was a sequential layer processor. It could not parallelize independent stages, it stored secrets in the image history, and its cache was coarse-grained. BuildKit, now the default engine since Docker 23.0, rewrites those rules from first principles. At Google, Meta, and every other large-scale shop, understanding the internals of BuildKit is not optional: a 90-second CI build that could be 18 seconds is real engineer-hours wasted daily at fleet scale.

How BuildKit Differs Architecturally

BuildKit represents a Dockerfile as a directed acyclic graph (DAG) of LLB (Low-Level Build) operations rather than a flat list of instructions. This graph is sent to the buildkitd daemon, which can schedule independent nodes in parallel, skip subgraphs whose outputs are already cached, and stream layers as content-addressable blobs. The result: multi-stage builds that are genuinely concurrent, not just apparently so.

BuildKit schedules independent stages in parallel and merges their outputs into the final image.

Enabling BuildKit

Since Docker Engine 23.0, BuildKit is the default. On older setups or in CI environments, force it explicitly:

# Per-invocation (environment variable)
DOCKER_BUILDKIT=1 docker build -t myapp:latest .

# Permanently in Docker daemon config (/etc/docker/daemon.json)
{
  "features": {
    "buildkit": true
  }
}

# Verify BuildKit is active — look for "BuildKit" in build output header
docker build --progress=plain -t myapp:latest . 2>&1 | head -5

Cache Mounts: The Biggest Single Win

A cache mount (--mount=type=cache) attaches a persistent directory to a RUN instruction that survives across builds without becoming a layer. This is the canonical way to cache package managers. Without cache mounts, every go mod download or pip install re-downloads gigabytes of dependencies on every cache miss.

# syntax=docker/dockerfile:1
FROM golang:1.22-alpine AS builder

WORKDIR /src
COPY go.mod go.sum ./

# Cache mount: Go module cache is stored outside the image layer graph.
# The id "go-mod" is shared across all builds on this host (or BuildKit instance).
RUN --mount=type=cache,target=/root/go/pkg/mod \
    go mod download

COPY . .

# A second cache mount for the Go build cache — incremental compilation
RUN --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 GOOS=linux go build -trimpath -o /app/server ./cmd/server

FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

Cache mount scope matters. By default the cache is scoped to the id + the current user. Set sharing=locked (one build at a time) or sharing=private (independent copy per build) when running concurrent builds in CI to avoid cache corruption. The default is sharing=shared — fine for most cases.

Build Secrets: Never in Layers

The pre-BuildKit workaround for secrets (ARG, ENV, multi-stage copies) always left the secret in at least one intermediate layer visible via docker history. BuildKit secrets are mounted as a tmpfs during the RUN instruction and are never written to any layer. This is the only production-safe way to consume credentials at build time.

# syntax=docker/dockerfile:1
FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .

# The secret is mounted at /run/secrets/pip_token during this RUN only.
# It does not appear in docker history or any layer.
RUN --mount=type=secret,id=pip_token \
    pip install \
      --extra-index-url https://$(cat /run/secrets/pip_token)@pypi.company.internal/simple \
      --no-cache-dir \
      -r requirements.txt

COPY src/ ./src/

# Build invocation — pass the secret from the host environment or a file
# docker build --secret id=pip_token,env=PIP_TOKEN .
# docker build --secret id=pip_token,src=/run/secrets/pip_token .

Never use ARG to pass secrets. Even if the ARG is not echoed, its value is stored in the image manifest and recoverable with docker history --no-trunc. At Google-scale internal package registries this is a live vulnerability category. Use --mount=type=secret exclusively.

SSH Agent Forwarding at Build Time

Private Git dependencies require SSH access during build. BuildKit provides --mount=type=ssh which forwards the host SSH agent socket into the build — no key files ever touch the image:

# In the Dockerfile
RUN --mount=type=ssh \
    git clone git@github.com:company/private-lib.git /tmp/lib

# Build invocation — ensure ssh-agent has the key loaded
ssh-add ~/.ssh/id_ed25519
docker build --ssh default .

# In GitHub Actions (CI)
- uses: webfactory/ssh-agent@v0.9.0
  with:
    ssh-private-key: ${{ secrets.DEPLOY_KEY }}
- run: docker build --ssh default .

docker buildx & Multi-Platform Builds

buildx is the CLI plugin that exposes the full BuildKit API. Its most production-critical feature is multi-platform image builds — building linux/amd64 and linux/arm64 in a single command and pushing a multi-arch manifest. At AWS and GCP, arm64 (Graviton / Tau T2A) offers roughly 20–40% cost reduction for compute-bound workloads.

# Create a builder that supports cross-compilation via QEMU
docker buildx create --name multiarch --driver docker-container --bootstrap
docker buildx use multiarch

# Build and push a multi-arch image in one step
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag registry.example.com/myapp:1.2.3 \
  --tag registry.example.com/myapp:latest \
  --push \
  .

# Inspect the resulting manifest list
docker buildx imagetools inspect registry.example.com/myapp:latest

Use native builders when possible. QEMU emulation is slow — a 3-minute amd64 build can take 25 minutes emulated on arm64. In CI, provision one ubuntu-latest (amd64) runner and one ubuntu-latest-arm64 runner (GitHub Actions or self-hosted), build each platform natively, then use docker buildx imagetools create --tag ... --amend ... to merge the manifests. This is what large registries do internally.

Inline Cache and Remote Cache Backends

BuildKit can export its cache to a registry so that cold CI runners reuse prior build artifacts. With --cache-to and --cache-from, a fresh runner achieves near-warm-cache build times:

# Export cache to registry (max mode caches all intermediate layers, not just final)
docker buildx build \
  --cache-from type=registry,ref=registry.example.com/myapp:cache \
  --cache-to   type=registry,ref=registry.example.com/myapp:cache,mode=max \
  --tag registry.example.com/myapp:${GIT_SHA} \
  --push \
  .

# GitHub Actions — full pattern
- name: Build and push
  uses: docker/build-push-action@v6
  with:
    push: true
    tags: registry.example.com/myapp:${{ github.sha }}
    cache-from: type=registry,ref=registry.example.com/myapp:cache
    cache-to: type=registry,ref=registry.example.com/myapp:cache,mode=max
    platforms: linux/amd64,linux/arm64

Cache mode=max vs mode=min. mode=min (default) only caches the final stage layers — useful when you want the smallest cache blob. mode=max caches every intermediate stage, producing better hit rates for multi-stage builds at the cost of a larger cache image. In monorepos with many shared base stages, mode=max almost always wins.

Production Failure Modes

Cache poisoning via mutable tags: --cache-from with a :latest cache tag that another job is simultaneously overwriting leads to non-deterministic builds. Pin cache tags to a branch or a stable ref.
Stale cache mounts in CI: BuildKit cache mounts are node-local. On ephemeral CI runners, the mount is always empty. Use registry cache backends, not local cache mounts, for CI.
buildkitd OOM on large monorepos: Increase buildkitd memory limits and configure --oci-worker-snapshotter=overlayfs on Linux for better layer deduplication.
Secret leakage via build args: See the warning above. Audit CI logs for ARG-passed tokens — they appear in plain text in verbose build output.