Continuous Integration Fundamentals

Project: Design a CI Pipeline

18 min Lesson 10 of 28

Project: Design a CI Pipeline

The best way to cement everything you have learned about continuous integration is to design a complete pipeline from scratch — not a "hello world" workflow, but a production-grade spec for a real service. This lesson walks you through that design exercise end-to-end: you will choose a sample service, enumerate the pipeline's jobs, wire them together, add security gates, handle secrets, and produce a specification that a senior engineer at Google or GitHub would recognise as production-ready.

The Sample Service

We will design a CI pipeline for OrderService, a microservice written in Go that exposes a REST API, writes to PostgreSQL, publishes events to a Kafka topic, and is deployed as a container on Kubernetes. This stack is representative of the backend services you will encounter at scale. The repository layout is:

order-service/ ├── cmd/orderservice/main.go ├── internal/ │ ├── handler/ # HTTP handlers │ ├── store/ # Postgres queries │ └── event/ # Kafka publisher ├── migrations/ # SQL migration files ├── k8s/ # Kubernetes manifests ├── Dockerfile ├── docker-compose.yml # local dev + CI integration env ├── Makefile ├── go.mod └── go.sum

Pipeline Goals

Before writing a single YAML line, write down your goals. Every stage you add must serve at least one of these:

  1. Fast feedback — developers learn within 5 minutes whether their change is correct.
  2. Consistent environment — the build is hermetic; it produces the same output regardless of who triggers it.
  3. Security gates — no secret leaks; all dependencies scanned for CVEs; SLSA provenance attached.
  4. Deployable artifact — the pipeline's final output is an OCI image pushed to a registry, tagged by commit SHA, ready to be picked up by CD.
Design before you implement. Teams that jump straight to writing YAML end up with pipelines that are brittle, slow, and duplicative. A 30-minute design session — answering "what do we need to verify, in what order, and how fast?" — will save weeks of pipeline firefighting later.

Stage-by-Stage Pipeline Design

The pipeline has six jobs. They are not all sequential — parallelism is the key to meeting the 5-minute target. Here is the full dependency graph:

OrderService CI pipeline dependency graph Validate lint · vet · fmt Build go build · binary Unit Test go test · coverage Security Scan govulncheck · trivy Integration Test Postgres · Kafka Publish docker push · SLSA ~45 s ~60 s ~90 s ~80 s ~2 min ~90 s Total wall-clock time: ~5 min 30 s (critical path through Integration Test)
OrderService CI pipeline: Validate and Build are sequential gating stages; Unit Test, Integration Test, and Security Scan run in parallel; Publish waits for all three.

Stage 1 — Validate (lint, vet, format)

Validation is the fastest stage and should fail loudest. It catches style violations, unused imports, shadow variables, and unreachable code before wasting a minute on compilation. Running it first means that a developer who forgot to run gofmt gets feedback in under a minute — not after waiting for a full build.

# .github/workflows/ci.yml (top of file — triggers & defaults) name: CI — OrderService on: push: branches: [main, 'release/**'] pull_request: branches: [main] defaults: run: shell: bash env: GO_VERSION: '1.23.4' # pinned — never use 'stable' or '1.x' IMAGE: ghcr.io/${{ github.repository }} jobs: validate: name: Lint & Vet runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: ${{ env.GO_VERSION }} cache: true # caches go module download cache - name: golangci-lint uses: golangci/golangci-lint-action@v6 with: version: v1.61.0 # pinned; never use 'latest' args: --timeout=5m --config=.golangci.yml - name: go vet run: go vet ./...

Stage 2 — Build

The build stage produces the binary that every downstream job depends on. For Go, a statically linked binary with the commit SHA baked in is the output. Upload it as an artifact so the integration-test job does not re-compile from source — this saves time and guarantees all stages test the exact same binary.

build: name: Build Binary needs: validate runs-on: ubuntu-latest outputs: version: ${{ steps.ver.outputs.sha }} steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: ${{ env.GO_VERSION }} cache: true - name: Compute version id: ver run: | SHA=$(echo "$GITHUB_SHA" | head -c8) echo "sha=${SHA}" >> "$GITHUB_OUTPUT" - name: Build static binary env: CGO_ENABLED: '0' # fully static — no glibc dep in container run: | go build \ -trimpath \ -ldflags="-s -w -X main.Version=${{ steps.ver.outputs.sha }}" \ -o dist/orderservice \ ./cmd/orderservice - name: Upload binary artifact uses: actions/upload-artifact@v4 with: name: orderservice-bin-${{ steps.ver.outputs.sha }} path: dist/orderservice retention-days: 3 # intermediate artifact; short TTL

Stages 3, 4, 5 — Parallel Gates

After the build, three jobs run concurrently. They each express needs: build — GitHub Actions will start all three the moment the build job succeeds. This is the key parallelism that keeps the pipeline under six minutes.

Unit Test — downloads the binary artifact, then runs go test -race -coverprofile=coverage.out ./.... The -race flag enables Go's race detector, which catches data races at near-zero overhead. Coverage is uploaded as a CI artifact and also forwarded to Codecov (or similar). A coverage gate of 80% is enforced: if go tool cover -func=coverage.out shows less than 80%, the job fails.

Integration Test — spins up PostgreSQL and Kafka via Docker Compose using the services block. It runs the SQL migrations, then executes the integration-test suite. This is the only job with external service dependencies — keeping it isolated to one job means the other two parallel jobs are not slowed by container startup time.

Security Scan — runs two tools: govulncheck (checks your Go module graph against the Go vulnerability database — zero false positives, only vulnerabilities in code you actually call) and trivy (scans the Dockerfile for OS-level CVEs in the base image). If any CRITICAL or HIGH vulnerability is found, the job fails and blocks the Publish stage.

unit-test: name: Unit Test needs: build runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: ${{ env.GO_VERSION }} cache: true - name: Run unit tests with race detector run: | go test -race -coverprofile=coverage.out -covermode=atomic ./... COVERAGE=$(go tool cover -func=coverage.out | grep total | awk '{print $3}' | tr -d '%') echo "Total coverage: ${COVERAGE}%" if (( $(echo "$COVERAGE < 80" | bc -l) )); then echo "::error::Coverage ${COVERAGE}% is below the 80% gate" exit 1 fi - name: Upload coverage report uses: actions/upload-artifact@v4 with: name: coverage-report path: coverage.out retention-days: 7 integration-test: name: Integration Test needs: build runs-on: ubuntu-latest services: postgres: image: postgres:16-alpine env: POSTGRES_USER: orders POSTGRES_PASSWORD: orders POSTGRES_DB: orders_test options: >- --health-cmd pg_isready --health-interval 5s --health-timeout 5s --health-retries 10 kafka: image: bitnami/kafka:3.7 env: KAFKA_CFG_NODE_ID: '0' KAFKA_CFG_PROCESS_ROLES: controller,broker KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093 KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 0@localhost:9093 KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: ${{ env.GO_VERSION }} cache: true - name: Run DB migrations env: DATABASE_URL: postgres://orders:orders@localhost:5432/orders_test?sslmode=disable run: go run ./cmd/migrate up - name: Run integration tests env: DATABASE_URL: postgres://orders:orders@localhost:5432/orders_test?sslmode=disable KAFKA_BROKERS: localhost:9092 INTEGRATION: 'true' run: go test -tags=integration -timeout=3m ./... security-scan: name: Security Scan needs: build runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: ${{ env.GO_VERSION }} cache: true - name: govulncheck — Go module vulnerabilities run: | go install golang.org/x/vuln/cmd/govulncheck@latest govulncheck ./... - name: trivy — Dockerfile & OS CVE scan uses: aquasecurity/trivy-action@master with: scan-type: fs scan-ref: . severity: HIGH,CRITICAL exit-code: '1' # fail on HIGH or CRITICAL findings ignore-unfixed: true # skip vulnerabilities with no patch yet

Stage 6 — Publish

The Publish job runs only when all three parallel gates pass. It builds the OCI image with Docker BuildKit, tags it with three tags (short SHA, branch name, semver if triggered by a tag), pushes to GitHub Container Registry, and attaches an SLSA provenance attestation. The attestation records the exact workflow run, commit, and inputs — satisfying SLSA Level 2 out of the box with GitHub's hosted runners.

publish: name: Publish Image needs: [unit-test, integration-test, security-scan] runs-on: ubuntu-latest permissions: contents: read packages: write id-token: write # required for SLSA attestation attestations: write steps: - uses: actions/checkout@v4 - name: Log in to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract Docker metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.IMAGE }} tags: | type=sha,prefix=,format=short # abc1234f — primary deploy tag type=ref,event=branch # main type=semver,pattern={{version}} # 2.4.1 on a git tag push - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Build and push image id: push uses: docker/build-push-action@v6 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max build-args: | VERSION=${{ needs.build.outputs.version }} - name: Attest build provenance (SLSA) uses: actions/attest-build-provenance@v1 with: subject-name: ${{ env.IMAGE }} subject-digest: ${{ steps.push.outputs.digest }} push-to-registry: true - name: Output deploy tag run: | echo "### Artifact ready for CD" >> "$GITHUB_STEP_SUMMARY" echo "Image: \`${{ env.IMAGE }}@${{ steps.push.outputs.digest }}\`" >> "$GITHUB_STEP_SUMMARY"
Secrets and permissions flow in the OrderService pipeline GitHub Secrets GITHUB_TOKEN (auto) SONAR_TOKEN (manual) OIDC Token id-token: write Runner (publish job) env: GITHUB_TOKEN scoped per job SLSA attestation signed with OIDC GHCR Registry ghcr.io / org / image Sigstore / TUF provenance stored Minimal-privilege secrets flow — secrets never appear in logs
Secrets are injected into the runner environment at job scope; OIDC tokens are short-lived and never stored; SLSA provenance is pushed directly to the registry trust root.

Secrets Design

Every secret in this pipeline follows the principle of least privilege. The rules applied here are the same rules Google and GitHub use for their own internal pipelines:

  • GITHUB_TOKEN is auto-provisioned per job and expires when the job ends. It is scoped to only the permissions declared in the permissions block — the Publish job requests packages: write and id-token: write; earlier jobs have only contents: read.
  • No third-party secret (Sonar token, Slack webhook, etc.) is ever passed to a job that does not need it. Declare secrets at the job level, not at the workflow level.
  • Secrets are never echoed, interpolated into URLs, or stored in environment variables that might appear in the runner's process list. Use --password-stdin for docker login, not --password $SECRET.
  • All external actions are pinned by SHA (actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af68), not by mutable tag. This prevents a compromised action publisher from injecting malicious code into your pipeline.
Never use pull_request_target with write permissions. This event runs with the base branch's secrets — which means a malicious PR can exfiltrate every secret in your repository if you grant it write-level permissions. For external contributors, use pull_request (read-only secrets) and require maintainer approval to run privileged jobs.

Quality Gate Summary

A well-designed pipeline has explicit, documented quality gates that everyone on the team understands. Here is the gate card for OrderService:

  • lint — zero golangci-lint warnings (config in .golangci.yml, enforced on every PR)
  • unit coverage — 80% minimum total line coverage
  • race detector — zero data races (enforced by -race flag)
  • integration — all integration tests pass against real Postgres 16 and Kafka 3.7
  • vulnerabilities — zero HIGH or CRITICAL CVEs in Go modules or base image
  • build reproducibilityCGO_ENABLED=0, -trimpath, pinned Go version, pinned base image digest in Dockerfile
  • artifact integrity — SLSA Level 2 provenance attached to every image pushed to main
Write your quality gates into a PIPELINE.md in the repo root. When a gate fails and a developer asks "why does this pipeline fail on 78% coverage?", a documented gate policy ends the debate instantly. Document the rationale, not just the number: "80% is the minimum required to catch regressions in the store layer, which has no contract tests."

Common Failure Modes in Pipeline Design

  • Missing needs chains — the Publish job runs even when security-scan failed because the author forgot to list it in needs. Always list every gate job explicitly in the final stage's needs array.
  • Hardcoded credentials in YAML — a common mistake when moving fast. Audit every new workflow for literal secrets before merging.
  • Flaky integration tests — a test that depends on Kafka timing out occasionally will erode trust in CI until developers start re-running jobs without investigating. Fix flakes immediately; treat them as bugs, not annoyances.
  • No caching strategy — re-downloading Go modules and rebuilding Docker layers from scratch on every run can add 3-4 minutes of pure wait time. Use cache: true in setup-go and cache-from: type=gha in BuildKit.
  • Overly broad permissions — granting contents: write to every job because one job needs to push a tag. Scope permissions to the minimum each job needs.