Continuous Integration Fundamentals

Artifacts & Build Outputs

18 min Lesson 6 of 28

Artifacts & Build Outputs

Every CI pipeline produces outputs: compiled binaries, container images, test reports, coverage files, signed packages. These outputs are called artifacts. Treating artifacts as first-class citizens — with deterministic versioning, secure storage, and reliable inter-stage handoff — is what separates a hobbyist pipeline from a production-grade system at big-tech scale.

What Counts as an Artifact?

An artifact is any file produced by one stage of a pipeline that is either (a) consumed by a later stage, or (b) published for external use. Common examples include:

Compiled binaries — Go, Rust, Java JARs/WARs, .NET DLLs
Container images — OCI-format layers pushed to a registry
Language packages — npm tarballs, Python wheels, Ruby gems, Maven artifacts
Static sites — the output of next build, hugo, vite build
Test & coverage reports — JUnit XML, lcov HTML, SARIF files
Infrastructure plans — terraform plan JSON saved before apply
Release archives — tarballs or ZIPs attached to a GitHub Release

Artifact Storage: Where and Why It Matters

Artifacts must be stored outside the ephemeral runner. When a runner VM is recycled — or when a parallel job on a different machine needs the artifact — it must be retrievable from a stable, authenticated location. The main tiers are:

CI-native artifact stores — GitHub Actions Artifacts (backed by Azure Blob), GitLab Job Artifacts (S3-compatible). Zero-config but have size limits (GitHub: 500 MB per artifact, 10 GB per repo by default) and short default retention (90 days).
Package registries — GitHub Packages, GitLab Package Registry, JFrog Artifactory, Sonatype Nexus. The right home for versioned, publishable artifacts (JAR, npm, Docker image). They enforce immutability by convention (you cannot overwrite a published 1.2.3).
Object storage — AWS S3, GCS, Azure Blob. Used by large teams for everything that does not fit a package registry: Terraform plans, large ML model checkpoints, browser test videos.

Immutability is the contract. Once an artifact is published under a version tag (e.g., v2.4.1), it must never be overwritten. Any change — even a single byte — must produce a new version. This is why package registries reject re-uploads of the same version by default: a mutable artifact makes the entire supply chain untrustworthy.

Deterministic Versioning Schemes

Artifact versions must be unique, traceable back to a commit, and sortable. The three patterns used in production are:

SemVer from git tag — v2.4.1 triggered by a git tag v2.4.1. Standard for public packages. Tools: git describe --tags, semantic-release.
Commit SHA suffix — 2.4.1-abc1234. Every merge to main produces an artifact. The SHA makes the version traceable without a tag. Used heavily for internal services.
CalVer + build number — 2025.06.1042. Common in monorepos and mobile releases (App Store requires numeric build numbers). The build number is the CI run ID.

In GitHub Actions the short SHA is available as ${{ github.sha }} (full 40 chars) — slice it in bash: SHA=$(echo $GITHUB_SHA | head -c8).

Artifacts are uploaded to a central store after the build stage, then downloaded by subsequent stages — no direct runner-to-runner file transfer.

Passing Artifacts Between Stages (GitHub Actions)

Runners are isolated VMs. Files written to disk by one job are gone when the job ends. The canonical pattern: upload at the end of the build job, download at the start of every job that needs it. GitHub Actions provides actions/upload-artifact and actions/download-artifact for this.

# .github/workflows/ci.yml — artifact handoff between stages
name: CI

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      version: ${{ steps.version.outputs.sha }}
    steps:
      - uses: actions/checkout@v4

      - name: Compute version
        id: version
        run: echo "sha=$(echo $GITHUB_SHA | head -c8)" >> $GITHUB_OUTPUT

      - name: Build binary
        run: |
          go build -ldflags="-X main.Version=${{ steps.version.outputs.sha }}" \
            -o dist/myapp ./cmd/myapp

      - name: Upload artifact
        uses: actions/upload-artifact@v4
        with:
          name: myapp-${{ steps.version.outputs.sha }}
          path: dist/
          retention-days: 7           # keep 7 days; delete older builds automatically

  integration-test:
    needs: build                      # waits for build to complete
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Download artifact
        uses: actions/download-artifact@v4
        with:
          name: myapp-${{ needs.build.outputs.version }}
          path: dist/

      - name: Run integration tests
        run: |
          chmod +x dist/myapp
          ./dist/myapp --self-test

Always set retention-days. The default is 90 days. At Google and Meta scale, unset retention exhausts storage quota within weeks because every PR produces artifacts. Set short TTLs (3-7 days) for intermediate build artifacts and longer TTLs (90-365 days) only for release artifacts that may need investigation months later.

Publishing to a Package Registry

Container images are the most common production artifact. The publish step authenticates to a registry, tags the image with both a commit-specific tag and latest (for convenience), then pushes both. Using Docker BuildKit and layer caching dramatically speeds this up on repeat builds.

  publish:
    needs: integration-test
    runs-on: ubuntu-latest
    permissions:
      packages: write              # required to push to GitHub Container Registry
      contents: read
    env:
      REGISTRY: ghcr.io
      IMAGE: ghcr.io/${{ github.repository }}
    steps:
      - uses: actions/checkout@v4

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata for Docker
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.IMAGE }}
          tags: |
            type=sha,prefix=,format=short        # abc1234f
            type=ref,event=branch                # main
            type=semver,pattern={{version}}      # 2.4.1 when a tag is pushed

      - name: Build and push image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha               # GitHub Actions layer cache
          cache-to: type=gha,mode=max

Build Reproducibility

A build is reproducible when the same source commit always produces byte-for-identical artifacts. This matters for security (you can verify a binary matches its source) and debugging (you can rebuild a six-month-old release to investigate a CVE). Achieving it requires:

Lock files committed (go.sum, package-lock.json, Pipfile.lock, Gemfile.lock) — never resolve dependencies at runtime in CI.
Pinned base images — use a digest (FROM node:20@sha256:abc...) not a mutable tag (FROM node:latest).
Deterministic timestamps — set SOURCE_DATE_EPOCH (Unix timestamp of the last commit) so compressors and archivers do not embed wall-clock time.
Fixed tool versions — specify go-version: '1.23.4', node-version: '20.11.0' exactly, not '20'.

Mutable tags are a production hazard. If your deploy references myapp:latest and someone pushes a broken image with the same tag, every subsequent deployment pulls the broken image. Always deploy by digest (myapp@sha256:...) or by the immutable commit-SHA tag, and treat latest as a convenience alias only for local development.

Artifact Signing and Attestation

At companies like Google, every build artifact is signed with a cryptographic key (SLSA provenance). This lets downstream consumers verify that an artifact was produced by a specific pipeline run from a specific commit — not injected by an attacker who compromised the registry. GitHub Actions supports this natively via actions/attest-build-provenance, which issues an SLSA Level 3 attestation stored in the GitHub trust root. This is increasingly required by enterprise compliance (NIST SSDF, EU Cyber Resilience Act).

Common Failure Modes

Artifact name collision — two parallel jobs upload an artifact with the same name; the second silently overwrites the first. Always include the job matrix variable or branch name in the artifact name.
Missing needs dependency — a downstream job starts before the artifact is uploaded, gets a 404, and fails non-deterministically. Always declare explicit needs.
Uploading the entire repo — a misconfigured path: . uploads gigabytes and inflates storage costs. Be explicit: upload only the dist/ or build/ directory.
No retention policy — artifact storage bills compound. Automate cleanup with retention policies or a nightly purge job.