Continuous Integration Fundamentals

CI for Containers & Monorepos

18 min Lesson 9 of 28

CI for Containers & Monorepos

Two architectural realities dominate modern big-tech CI: nearly every deployable unit is a container image, and many teams cohabit a single repository (monorepo). These two facts collide hard in CI — naive pipelines build and test everything on every push, burning hundreds of minutes of runner time per commit. This lesson covers how production teams handle both: building images efficiently inside CI and running only the affected targets when only a slice of the codebase changed.

Building Container Images in CI

A Dockerfile is deterministic source code. The CI pipeline is responsible for building it, tagging the result, pushing it to a registry, and making the digest available for deployment. The four non-negotiables for production image builds are:

BuildKit is mandatory. Classic docker build is serial and has no remote cache API. Enable BuildKit by setting DOCKER_BUILDKIT=1 or using the docker/build-push-action which enables it automatically. BuildKit parallelises stages, skips unreachable stages, and supports the --mount=type=cache instruction for dependency caches inside the build.
Layer order determines cache hit rate. Copy files that change rarely (lock files, static configs) early; copy application source last. A misplaced COPY . . at step 3 of 10 invalidates the cache for every commit.
Multi-stage builds keep images minimal. A build stage with compilers, SDKs, and test tooling should not ship to production. The final FROM stage should be a distroless or slim base that contains only the runtime binary and its dependencies.
Always push by digest, deploy by digest. Tags are mutable aliases. The CI pipeline should record the sha256 digest of the pushed image (available from docker/build-push-action outputs) and pass it to the deploy stage.

# Dockerfile — multi-stage build for a Go service
FROM golang:1.23-alpine AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download                # cached unless go.sum changes
COPY . .
RUN CGO_ENABLED=0 go build -o /bin/api ./cmd/api

# Final stage: scratch + ca-certs only
FROM gcr.io/distroless/static-debian12
COPY --from=builder /bin/api /api
ENTRYPOINT ["/api"]

# .github/workflows/ci.yml — building and pushing a container image
name: CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  build-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write          # needed for SLSA attestation

    outputs:
      digest: ${{ steps.build.outputs.digest }}

    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract image metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}/api
          tags: |
            type=sha,format=short          # abc1234
            type=ref,event=pr             # pr-42
            type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}

      - name: Build and push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          file: ./services/api/Dockerfile
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          provenance: true               # SLSA provenance attestation
          sbom: true                     # generate SBOM

      - name: Print digest
        run: echo "Pushed ${{ steps.build.outputs.digest }}"

Do not push images on pull requests. Set push: ${{ github.event_name != 'pull_request' }} as shown above. Building (and optionally scanning) the image on a PR gives you fast feedback without polluting the registry with thousands of unreviewed images. On merge to main, the image is built again from the same cache and pushed — the second build is essentially free because the layer cache is warm.

Path-Filtered Triggers

In a standard repository, pushing a README change should not rebuild and re-test the entire application. Path filters restrict which file changes trigger a workflow. GitHub Actions supports them natively via the paths and paths-ignore keys on push and pull_request events.

# Trigger only when files under services/api/ or its Dockerfile change
on:
  push:
    branches: [main]
    paths:
      - 'services/api/**'
      - 'Dockerfile'
      - '.github/workflows/api.yml'
  pull_request:
    paths:
      - 'services/api/**'
      - 'Dockerfile'
      - '.github/workflows/api.yml'

The pattern services/api/** uses glob syntax: ** matches any number of path segments including zero. Common patterns used in production:

src/**/*.ts — any TypeScript file anywhere under src/
!docs/** — exclude changes in docs/ (negate with !)
packages/shared/** — changes to a shared library that many services depend on
.github/workflows/api.yml — the workflow file itself; changing it should re-run the workflow

Path filters can silently skip required status checks. GitHub branch protection rules require certain CI checks to pass before merge. If a PR touches only docs/ and the CI workflow is path-filtered to skip docs changes, the required check never runs — and GitHub treats a skipped check as not-yet-passed, blocking the merge. The fix: use a separate lightweight workflow for docs-only changes that always passes, or configure the required check with "skipped counts as success" in branch protection settings (available since GitHub Actions skippable status checks, 2024).

Affected-Target Builds in Monorepos

A monorepo hosts multiple services, libraries, and applications in one repository. Google's internal build system (Blaze, open-sourced as Bazel) pioneered the concept of affected-target builds: only build and test the transitive closure of targets that depend on changed files. This is the critical insight that lets Google run CI over millions of lines with tens of thousands of targets while keeping per-commit feedback under 10 minutes.

The four dominant tools for affected-target analysis in open-source monorepos are:

Nx (JavaScript/TypeScript, but supports any language via plugins) — nx affected --target=build computes the affected graph using a dependency graph and commit range.
Turborepo (JavaScript/TypeScript) — turbo run build --filter=[HEAD^1] runs only affected packages using content-hash caching.
Bazel (polyglot) — bazel build $(bazel query 'rdeps(//..., set($(git diff --name-only HEAD^)))') uses the full dependency graph to find reverse dependents of changed files.
Pants (Python, Java, Go, Scala) — pants --changed-since=origin/main test runs the same affected-target logic with a Pants-specific query engine.

When shared-utils changes, only api-service, auth-lib, and gateway (which transitively depend on it) are rebuilt. billing-service is untouched and skipped entirely.

Nx Affected in GitHub Actions

The most common JavaScript/TypeScript monorepo setup uses Nx. The key is computing the base commit (BASE) that represents the last known-good state — typically the base branch for PRs, or the previous commit for pushes to main.

# .github/workflows/ci.yml — Nx affected targets in a monorepo
name: CI (Monorepo)

on:
  push:
    branches: [main]
  pull_request:

jobs:
  affected:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0               # full history required for git diff

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - run: npm ci

      - name: Derive base and head SHAs
        uses: nrwl/nx-set-shas@v4      # sets NX_BASE and NX_HEAD env vars

      - name: Lint affected
        run: npx nx affected --target=lint --base=$NX_BASE --head=$NX_HEAD --parallel=3

      - name: Test affected
        run: npx nx affected --target=test --base=$NX_BASE --head=$NX_HEAD --parallel=3

      - name: Build affected
        run: npx nx affected --target=build --base=$NX_BASE --head=$NX_HEAD --parallel=3

      - name: Docker build affected
        run: |
          npx nx affected --target=docker-build --base=$NX_BASE --head=$NX_HEAD

fetch-depth: 0 is non-negotiable in monorepos. Affected-target tools diff the current HEAD against the base branch commit. Without full history, git diff has nothing to compare against and the tool either errors out or falls back to rebuilding everything — defeating the entire purpose. Always set fetch-depth: 0 on the checkout step in monorepo workflows.

Per-Service Dockerfiles in a Monorepo

Each service in a monorepo has its own Dockerfile, but the Docker build context must include shared library code that lives outside the service directory. The solution is to always set the build context to the repository root and reference the Dockerfile by path:

# Build context = repo root; Dockerfile lives inside the service folder
docker build \
  --file services/api/Dockerfile \
  --tag ghcr.io/myorg/api:$SHA \
  --build-arg SERVICE=api \
  .                           # <-- context is repo root, not services/api/

# Inside services/api/Dockerfile, shared libs are reachable:
# COPY libs/shared-utils ./libs/shared-utils
# COPY services/api ./services/api

Docker build context scope is a common monorepo pitfall. If you run docker build services/api/ with the service directory as the context, Docker cannot access libs/shared-utils/ — it lives outside the context. The Dockerfile COPY libs/shared-utils ... instruction fails with "path not found". Always build from the repo root. For large monorepos, use .dockerignore aggressively to exclude other services and their node_modules from the context — otherwise the context sent to the BuildKit daemon can be gigabytes.

Remote Caching for Monorepos

Affected-target builds solve which targets to run. Remote caching solves whether to run them at all. If target X was built from source hash H and its result is already in the remote cache, skip the execution entirely and restore the output. This is how Google achieves near-zero incremental build times: the vast majority of targets hit the cache on every CI run.

Nx Cloud — managed remote cache for Nx workspaces. Connect with nx connect and add NX_CLOUD_ACCESS_TOKEN to secrets. Cache hit rates of 80-95% are typical for active teams.
Turborepo Remote Cache — built into turbo run with --api and --token flags pointing to a Vercel or self-hosted cache server.
Bazel Remote Cache — any gRPC/HTTP endpoint; Google's open-source bazel-remote project works on GCS or S3 backends.

Common Failure Modes

Shallow clone in affected builds — fetch-depth: 1 (the GitHub Actions default) means git diff sees no history; the tool rebuilds everything. Set fetch-depth: 0.
Missing .dockerignore — sending a 2 GB monorepo as the Docker build context adds 30–90 seconds to every image build. Maintain a root-level .dockerignore that excludes .git, test fixtures, and other services.
Pushing on PRs — registry fills with unreviewed images from every PR commit. Gate the push: true on github.event_name != 'pull_request'.
Implicit latest tag in production — deploying :latest makes rollbacks unreliable and makes it impossible to know what is running without inspecting the live container. Always deploy by the immutable digest or commit-SHA tag.
Sharing a single workflow file for all services — one YAML file with a matrix over all services builds everything on every push, negating the benefit of affected builds. Give each service (or service group) its own workflow file with appropriate path filters.