Continuous Integration Fundamentals

Artifacts & Build Outputs

18 min Lesson 6 of 28

Artifacts & Build Outputs

Every CI pipeline produces outputs: compiled binaries, container images, test reports, coverage files, signed packages. These outputs are called artifacts. Treating artifacts as first-class citizens — with deterministic versioning, secure storage, and reliable inter-stage handoff — is what separates a hobbyist pipeline from a production-grade system at big-tech scale.

What Counts as an Artifact?

An artifact is any file produced by one stage of a pipeline that is either (a) consumed by a later stage, or (b) published for external use. Common examples include:

  • Compiled binaries — Go, Rust, Java JARs/WARs, .NET DLLs
  • Container images — OCI-format layers pushed to a registry
  • Language packages — npm tarballs, Python wheels, Ruby gems, Maven artifacts
  • Static sites — the output of next build, hugo, vite build
  • Test & coverage reports — JUnit XML, lcov HTML, SARIF files
  • Infrastructure plansterraform plan JSON saved before apply
  • Release archives — tarballs or ZIPs attached to a GitHub Release

Artifact Storage: Where and Why It Matters

Artifacts must be stored outside the ephemeral runner. When a runner VM is recycled — or when a parallel job on a different machine needs the artifact — it must be retrievable from a stable, authenticated location. The main tiers are:

  • CI-native artifact stores — GitHub Actions Artifacts (backed by Azure Blob), GitLab Job Artifacts (S3-compatible). Zero-config but have size limits (GitHub: 500 MB per artifact, 10 GB per repo by default) and short default retention (90 days).
  • Package registries — GitHub Packages, GitLab Package Registry, JFrog Artifactory, Sonatype Nexus. The right home for versioned, publishable artifacts (JAR, npm, Docker image). They enforce immutability by convention (you cannot overwrite a published 1.2.3).
  • Object storage — AWS S3, GCS, Azure Blob. Used by large teams for everything that does not fit a package registry: Terraform plans, large ML model checkpoints, browser test videos.
Immutability is the contract. Once an artifact is published under a version tag (e.g., v2.4.1), it must never be overwritten. Any change — even a single byte — must produce a new version. This is why package registries reject re-uploads of the same version by default: a mutable artifact makes the entire supply chain untrustworthy.

Deterministic Versioning Schemes

Artifact versions must be unique, traceable back to a commit, and sortable. The three patterns used in production are:

  1. SemVer from git tagv2.4.1 triggered by a git tag v2.4.1. Standard for public packages. Tools: git describe --tags, semantic-release.
  2. Commit SHA suffix2.4.1-abc1234. Every merge to main produces an artifact. The SHA makes the version traceable without a tag. Used heavily for internal services.
  3. CalVer + build number2025.06.1042. Common in monorepos and mobile releases (App Store requires numeric build numbers). The build number is the CI run ID.

In GitHub Actions the short SHA is available as ${{ github.sha }} (full 40 chars) — slice it in bash: SHA=$(echo $GITHUB_SHA | head -c8).

Artifact flow through pipeline stages Build compile + package Test unit + integration Publish push to registry Deploy pull & rollout Artifact Store S3 / Registry / CI cache upload download pull image myapp:2.4.1-abc1234f version = semver + short SHA — immutable once stored
Artifacts are uploaded to a central store after the build stage, then downloaded by subsequent stages — no direct runner-to-runner file transfer.

Passing Artifacts Between Stages (GitHub Actions)

Runners are isolated VMs. Files written to disk by one job are gone when the job ends. The canonical pattern: upload at the end of the build job, download at the start of every job that needs it. GitHub Actions provides actions/upload-artifact and actions/download-artifact for this.

# .github/workflows/ci.yml — artifact handoff between stages name: CI on: push: branches: [main] jobs: build: runs-on: ubuntu-latest outputs: version: ${{ steps.version.outputs.sha }} steps: - uses: actions/checkout@v4 - name: Compute version id: version run: echo "sha=$(echo $GITHUB_SHA | head -c8)" >> $GITHUB_OUTPUT - name: Build binary run: | go build -ldflags="-X main.Version=${{ steps.version.outputs.sha }}" \ -o dist/myapp ./cmd/myapp - name: Upload artifact uses: actions/upload-artifact@v4 with: name: myapp-${{ steps.version.outputs.sha }} path: dist/ retention-days: 7 # keep 7 days; delete older builds automatically integration-test: needs: build # waits for build to complete runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Download artifact uses: actions/download-artifact@v4 with: name: myapp-${{ needs.build.outputs.version }} path: dist/ - name: Run integration tests run: | chmod +x dist/myapp ./dist/myapp --self-test
Always set retention-days. The default is 90 days. At Google and Meta scale, unset retention exhausts storage quota within weeks because every PR produces artifacts. Set short TTLs (3-7 days) for intermediate build artifacts and longer TTLs (90-365 days) only for release artifacts that may need investigation months later.

Publishing to a Package Registry

Container images are the most common production artifact. The publish step authenticates to a registry, tags the image with both a commit-specific tag and latest (for convenience), then pushes both. Using Docker BuildKit and layer caching dramatically speeds this up on repeat builds.

publish: needs: integration-test runs-on: ubuntu-latest permissions: packages: write # required to push to GitHub Container Registry contents: read env: REGISTRY: ghcr.io IMAGE: ghcr.io/${{ github.repository }} steps: - uses: actions/checkout@v4 - name: Log in to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata for Docker id: meta uses: docker/metadata-action@v5 with: images: ${{ env.IMAGE }} tags: | type=sha,prefix=,format=short # abc1234f type=ref,event=branch # main type=semver,pattern={{version}} # 2.4.1 when a tag is pushed - name: Build and push image uses: docker/build-push-action@v6 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha # GitHub Actions layer cache cache-to: type=gha,mode=max

Build Reproducibility

A build is reproducible when the same source commit always produces byte-for-identical artifacts. This matters for security (you can verify a binary matches its source) and debugging (you can rebuild a six-month-old release to investigate a CVE). Achieving it requires:

  • Lock files committed (go.sum, package-lock.json, Pipfile.lock, Gemfile.lock) — never resolve dependencies at runtime in CI.
  • Pinned base images — use a digest (FROM node:20@sha256:abc...) not a mutable tag (FROM node:latest).
  • Deterministic timestamps — set SOURCE_DATE_EPOCH (Unix timestamp of the last commit) so compressors and archivers do not embed wall-clock time.
  • Fixed tool versions — specify go-version: '1.23.4', node-version: '20.11.0' exactly, not '20'.
Mutable tags are a production hazard. If your deploy references myapp:latest and someone pushes a broken image with the same tag, every subsequent deployment pulls the broken image. Always deploy by digest (myapp@sha256:...) or by the immutable commit-SHA tag, and treat latest as a convenience alias only for local development.

Artifact Signing and Attestation

At companies like Google, every build artifact is signed with a cryptographic key (SLSA provenance). This lets downstream consumers verify that an artifact was produced by a specific pipeline run from a specific commit — not injected by an attacker who compromised the registry. GitHub Actions supports this natively via actions/attest-build-provenance, which issues an SLSA Level 3 attestation stored in the GitHub trust root. This is increasingly required by enterprise compliance (NIST SSDF, EU Cyber Resilience Act).

Common Failure Modes

  • Artifact name collision — two parallel jobs upload an artifact with the same name; the second silently overwrites the first. Always include the job matrix variable or branch name in the artifact name.
  • Missing needs dependency — a downstream job starts before the artifact is uploaded, gets a 404, and fails non-deterministically. Always declare explicit needs.
  • Uploading the entire repo — a misconfigured path: . uploads gigabytes and inflates storage costs. Be explicit: upload only the dist/ or build/ directory.
  • No retention policy — artifact storage bills compound. Automate cleanup with retention policies or a nightly purge job.