Continuous Integration Fundamentals

Project: Design a CI Pipeline

18 min Lesson 10 of 28

Project: Design a CI Pipeline

The best way to cement everything you have learned about continuous integration is to design a complete pipeline from scratch — not a "hello world" workflow, but a production-grade spec for a real service. This lesson walks you through that design exercise end-to-end: you will choose a sample service, enumerate the pipeline's jobs, wire them together, add security gates, handle secrets, and produce a specification that a senior engineer at Google or GitHub would recognise as production-ready.

The Sample Service

We will design a CI pipeline for OrderService, a microservice written in Go that exposes a REST API, writes to PostgreSQL, publishes events to a Kafka topic, and is deployed as a container on Kubernetes. This stack is representative of the backend services you will encounter at scale. The repository layout is:

order-service/
├── cmd/orderservice/main.go
├── internal/
│   ├── handler/        # HTTP handlers
│   ├── store/          # Postgres queries
│   └── event/          # Kafka publisher
├── migrations/         # SQL migration files
├── k8s/                # Kubernetes manifests
├── Dockerfile
├── docker-compose.yml  # local dev + CI integration env
├── Makefile
├── go.mod
└── go.sum

Pipeline Goals

Before writing a single YAML line, write down your goals. Every stage you add must serve at least one of these:

Fast feedback — developers learn within 5 minutes whether their change is correct.
Consistent environment — the build is hermetic; it produces the same output regardless of who triggers it.
Security gates — no secret leaks; all dependencies scanned for CVEs; SLSA provenance attached.
Deployable artifact — the pipeline's final output is an OCI image pushed to a registry, tagged by commit SHA, ready to be picked up by CD.

Design before you implement. Teams that jump straight to writing YAML end up with pipelines that are brittle, slow, and duplicative. A 30-minute design session — answering "what do we need to verify, in what order, and how fast?" — will save weeks of pipeline firefighting later.

Stage-by-Stage Pipeline Design

The pipeline has six jobs. They are not all sequential — parallelism is the key to meeting the 5-minute target. Here is the full dependency graph:

OrderService CI pipeline: Validate and Build are sequential gating stages; Unit Test, Integration Test, and Security Scan run in parallel; Publish waits for all three.

Stage 1 — Validate (lint, vet, format)

Validation is the fastest stage and should fail loudest. It catches style violations, unused imports, shadow variables, and unreachable code before wasting a minute on compilation. Running it first means that a developer who forgot to run gofmt gets feedback in under a minute — not after waiting for a full build.

# .github/workflows/ci.yml  (top of file — triggers & defaults)
name: CI — OrderService

on:
  push:
    branches: [main, 'release/**']
  pull_request:
    branches: [main]

defaults:
  run:
    shell: bash

env:
  GO_VERSION: '1.23.4'   # pinned — never use 'stable' or '1.x'
  IMAGE: ghcr.io/${{ github.repository }}

jobs:
  validate:
    name: Lint & Vet
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-go@v5
        with:
          go-version: ${{ env.GO_VERSION }}
          cache: true                  # caches go module download cache

      - name: golangci-lint
        uses: golangci/golangci-lint-action@v6
        with:
          version: v1.61.0            # pinned; never use 'latest'
          args: --timeout=5m --config=.golangci.yml

      - name: go vet
        run: go vet ./...

Stage 2 — Build

The build stage produces the binary that every downstream job depends on. For Go, a statically linked binary with the commit SHA baked in is the output. Upload it as an artifact so the integration-test job does not re-compile from source — this saves time and guarantees all stages test the exact same binary.

  build:
    name: Build Binary
    needs: validate
    runs-on: ubuntu-latest
    outputs:
      version: ${{ steps.ver.outputs.sha }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: ${{ env.GO_VERSION }}
          cache: true

      - name: Compute version
        id: ver
        run: |
          SHA=$(echo "$GITHUB_SHA" | head -c8)
          echo "sha=${SHA}" >> "$GITHUB_OUTPUT"

      - name: Build static binary
        env:
          CGO_ENABLED: '0'            # fully static — no glibc dep in container
        run: |
          go build \
            -trimpath \
            -ldflags="-s -w -X main.Version=${{ steps.ver.outputs.sha }}" \
            -o dist/orderservice \
            ./cmd/orderservice

      - name: Upload binary artifact
        uses: actions/upload-artifact@v4
        with:
          name: orderservice-bin-${{ steps.ver.outputs.sha }}
          path: dist/orderservice
          retention-days: 3           # intermediate artifact; short TTL

Stages 3, 4, 5 — Parallel Gates

After the build, three jobs run concurrently. They each express needs: build — GitHub Actions will start all three the moment the build job succeeds. This is the key parallelism that keeps the pipeline under six minutes.

Unit Test — downloads the binary artifact, then runs go test -race -coverprofile=coverage.out ./.... The -race flag enables Go's race detector, which catches data races at near-zero overhead. Coverage is uploaded as a CI artifact and also forwarded to Codecov (or similar). A coverage gate of 80% is enforced: if go tool cover -func=coverage.out shows less than 80%, the job fails.

Integration Test — spins up PostgreSQL and Kafka via Docker Compose using the services block. It runs the SQL migrations, then executes the integration-test suite. This is the only job with external service dependencies — keeping it isolated to one job means the other two parallel jobs are not slowed by container startup time.

Security Scan — runs two tools: govulncheck (checks your Go module graph against the Go vulnerability database — zero false positives, only vulnerabilities in code you actually call) and trivy (scans the Dockerfile for OS-level CVEs in the base image). If any CRITICAL or HIGH vulnerability is found, the job fails and blocks the Publish stage.

  unit-test:
    name: Unit Test
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: ${{ env.GO_VERSION }}
          cache: true

      - name: Run unit tests with race detector
        run: |
          go test -race -coverprofile=coverage.out -covermode=atomic ./...
          COVERAGE=$(go tool cover -func=coverage.out | grep total | awk '{print $3}' | tr -d '%')
          echo "Total coverage: ${COVERAGE}%"
          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
            echo "::error::Coverage ${COVERAGE}% is below the 80% gate"
            exit 1
          fi

      - name: Upload coverage report
        uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage.out
          retention-days: 7

  integration-test:
    name: Integration Test
    needs: build
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_USER: orders
          POSTGRES_PASSWORD: orders
          POSTGRES_DB: orders_test
        options: >-
          --health-cmd pg_isready
          --health-interval 5s
          --health-timeout 5s
          --health-retries 10
      kafka:
        image: bitnami/kafka:3.7
        env:
          KAFKA_CFG_NODE_ID: '0'
          KAFKA_CFG_PROCESS_ROLES: controller,broker
          KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
          KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
          KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 0@localhost:9093
          KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: ${{ env.GO_VERSION }}
          cache: true

      - name: Run DB migrations
        env:
          DATABASE_URL: postgres://orders:orders@localhost:5432/orders_test?sslmode=disable
        run: go run ./cmd/migrate up

      - name: Run integration tests
        env:
          DATABASE_URL: postgres://orders:orders@localhost:5432/orders_test?sslmode=disable
          KAFKA_BROKERS: localhost:9092
          INTEGRATION: 'true'
        run: go test -tags=integration -timeout=3m ./...

  security-scan:
    name: Security Scan
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: ${{ env.GO_VERSION }}
          cache: true

      - name: govulncheck — Go module vulnerabilities
        run: |
          go install golang.org/x/vuln/cmd/govulncheck@latest
          govulncheck ./...

      - name: trivy — Dockerfile & OS CVE scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: fs
          scan-ref: .
          severity: HIGH,CRITICAL
          exit-code: '1'              # fail on HIGH or CRITICAL findings
          ignore-unfixed: true        # skip vulnerabilities with no patch yet

Stage 6 — Publish

The Publish job runs only when all three parallel gates pass. It builds the OCI image with Docker BuildKit, tags it with three tags (short SHA, branch name, semver if triggered by a tag), pushes to GitHub Container Registry, and attaches an SLSA provenance attestation. The attestation records the exact workflow run, commit, and inputs — satisfying SLSA Level 2 out of the box with GitHub's hosted runners.

  publish:
    name: Publish Image
    needs: [unit-test, integration-test, security-scan]
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write               # required for SLSA attestation
      attestations: write
    steps:
      - uses: actions/checkout@v4

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract Docker metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.IMAGE }}
          tags: |
            type=sha,prefix=,format=short          # abc1234f — primary deploy tag
            type=ref,event=branch                  # main
            type=semver,pattern={{version}}        # 2.4.1 on a git tag push

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build and push image
        id: push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            VERSION=${{ needs.build.outputs.version }}

      - name: Attest build provenance (SLSA)
        uses: actions/attest-build-provenance@v1
        with:
          subject-name: ${{ env.IMAGE }}
          subject-digest: ${{ steps.push.outputs.digest }}
          push-to-registry: true

      - name: Output deploy tag
        run: |
          echo "### Artifact ready for CD" >> "$GITHUB_STEP_SUMMARY"
          echo "Image: \`${{ env.IMAGE }}@${{ steps.push.outputs.digest }}\`" >> "$GITHUB_STEP_SUMMARY"

Secrets are injected into the runner environment at job scope; OIDC tokens are short-lived and never stored; SLSA provenance is pushed directly to the registry trust root.

Secrets Design

Every secret in this pipeline follows the principle of least privilege. The rules applied here are the same rules Google and GitHub use for their own internal pipelines:

GITHUB_TOKEN is auto-provisioned per job and expires when the job ends. It is scoped to only the permissions declared in the permissions block — the Publish job requests packages: write and id-token: write; earlier jobs have only contents: read.
No third-party secret (Sonar token, Slack webhook, etc.) is ever passed to a job that does not need it. Declare secrets at the job level, not at the workflow level.
Secrets are never echoed, interpolated into URLs, or stored in environment variables that might appear in the runner's process list. Use --password-stdin for docker login, not --password $SECRET.
All external actions are pinned by SHA (actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af68), not by mutable tag. This prevents a compromised action publisher from injecting malicious code into your pipeline.

Never use pull_request_target with write permissions. This event runs with the base branch's secrets — which means a malicious PR can exfiltrate every secret in your repository if you grant it write-level permissions. For external contributors, use pull_request (read-only secrets) and require maintainer approval to run privileged jobs.

Quality Gate Summary

A well-designed pipeline has explicit, documented quality gates that everyone on the team understands. Here is the gate card for OrderService:

lint — zero golangci-lint warnings (config in .golangci.yml, enforced on every PR)
unit coverage — 80% minimum total line coverage
race detector — zero data races (enforced by -race flag)
integration — all integration tests pass against real Postgres 16 and Kafka 3.7
vulnerabilities — zero HIGH or CRITICAL CVEs in Go modules or base image
build reproducibility — CGO_ENABLED=0, -trimpath, pinned Go version, pinned base image digest in Dockerfile
artifact integrity — SLSA Level 2 provenance attached to every image pushed to main

Write your quality gates into a PIPELINE.md in the repo root. When a gate fails and a developer asks "why does this pipeline fail on 78% coverage?", a documented gate policy ends the debate instantly. Document the rationale, not just the number: "80% is the minimum required to catch regressions in the store layer, which has no contract tests."

Common Failure Modes in Pipeline Design

Missing needs chains — the Publish job runs even when security-scan failed because the author forgot to list it in needs. Always list every gate job explicitly in the final stage's needs array.
Hardcoded credentials in YAML — a common mistake when moving fast. Audit every new workflow for literal secrets before merging.
Flaky integration tests — a test that depends on Kafka timing out occasionally will erode trust in CI until developers start re-running jobs without investigating. Fix flakes immediately; treat them as bugs, not annoyances.
No caching strategy — re-downloading Go modules and rebuilding Docker layers from scratch on every run can add 3-4 minutes of pure wait time. Use cache: true in setup-go and cache-from: type=gha in BuildKit.
Overly broad permissions — granting contents: write to every job because one job needs to push a tag. Scope permissions to the minimum each job needs.