Project: Design a CI Pipeline
Project: Design a CI Pipeline
The best way to cement everything you have learned about continuous integration is to design a complete pipeline from scratch — not a "hello world" workflow, but a production-grade spec for a real service. This lesson walks you through that design exercise end-to-end: you will choose a sample service, enumerate the pipeline's jobs, wire them together, add security gates, handle secrets, and produce a specification that a senior engineer at Google or GitHub would recognise as production-ready.
The Sample Service
We will design a CI pipeline for OrderService, a microservice written in Go that exposes a REST API, writes to PostgreSQL, publishes events to a Kafka topic, and is deployed as a container on Kubernetes. This stack is representative of the backend services you will encounter at scale. The repository layout is:
Pipeline Goals
Before writing a single YAML line, write down your goals. Every stage you add must serve at least one of these:
- Fast feedback — developers learn within 5 minutes whether their change is correct.
- Consistent environment — the build is hermetic; it produces the same output regardless of who triggers it.
- Security gates — no secret leaks; all dependencies scanned for CVEs; SLSA provenance attached.
- Deployable artifact — the pipeline's final output is an OCI image pushed to a registry, tagged by commit SHA, ready to be picked up by CD.
Stage-by-Stage Pipeline Design
The pipeline has six jobs. They are not all sequential — parallelism is the key to meeting the 5-minute target. Here is the full dependency graph:
Stage 1 — Validate (lint, vet, format)
Validation is the fastest stage and should fail loudest. It catches style violations, unused imports, shadow variables, and unreachable code before wasting a minute on compilation. Running it first means that a developer who forgot to run gofmt gets feedback in under a minute — not after waiting for a full build.
Stage 2 — Build
The build stage produces the binary that every downstream job depends on. For Go, a statically linked binary with the commit SHA baked in is the output. Upload it as an artifact so the integration-test job does not re-compile from source — this saves time and guarantees all stages test the exact same binary.
Stages 3, 4, 5 — Parallel Gates
After the build, three jobs run concurrently. They each express needs: build — GitHub Actions will start all three the moment the build job succeeds. This is the key parallelism that keeps the pipeline under six minutes.
Unit Test — downloads the binary artifact, then runs go test -race -coverprofile=coverage.out ./.... The -race flag enables Go's race detector, which catches data races at near-zero overhead. Coverage is uploaded as a CI artifact and also forwarded to Codecov (or similar). A coverage gate of 80% is enforced: if go tool cover -func=coverage.out shows less than 80%, the job fails.
Integration Test — spins up PostgreSQL and Kafka via Docker Compose using the services block. It runs the SQL migrations, then executes the integration-test suite. This is the only job with external service dependencies — keeping it isolated to one job means the other two parallel jobs are not slowed by container startup time.
Security Scan — runs two tools: govulncheck (checks your Go module graph against the Go vulnerability database — zero false positives, only vulnerabilities in code you actually call) and trivy (scans the Dockerfile for OS-level CVEs in the base image). If any CRITICAL or HIGH vulnerability is found, the job fails and blocks the Publish stage.
Stage 6 — Publish
The Publish job runs only when all three parallel gates pass. It builds the OCI image with Docker BuildKit, tags it with three tags (short SHA, branch name, semver if triggered by a tag), pushes to GitHub Container Registry, and attaches an SLSA provenance attestation. The attestation records the exact workflow run, commit, and inputs — satisfying SLSA Level 2 out of the box with GitHub's hosted runners.
Secrets Design
Every secret in this pipeline follows the principle of least privilege. The rules applied here are the same rules Google and GitHub use for their own internal pipelines:
GITHUB_TOKENis auto-provisioned per job and expires when the job ends. It is scoped to only the permissions declared in thepermissionsblock — the Publish job requestspackages: writeandid-token: write; earlier jobs have onlycontents: read.- No third-party secret (Sonar token, Slack webhook, etc.) is ever passed to a job that does not need it. Declare secrets at the job level, not at the workflow level.
- Secrets are never echoed, interpolated into URLs, or stored in environment variables that might appear in the runner's process list. Use
--password-stdinfor docker login, not--password $SECRET. - All external actions are pinned by SHA (
actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af68), not by mutable tag. This prevents a compromised action publisher from injecting malicious code into your pipeline.
pull_request_target with write permissions. This event runs with the base branch's secrets — which means a malicious PR can exfiltrate every secret in your repository if you grant it write-level permissions. For external contributors, use pull_request (read-only secrets) and require maintainer approval to run privileged jobs.
Quality Gate Summary
A well-designed pipeline has explicit, documented quality gates that everyone on the team understands. Here is the gate card for OrderService:
- lint — zero golangci-lint warnings (config in
.golangci.yml, enforced on every PR) - unit coverage — 80% minimum total line coverage
- race detector — zero data races (enforced by
-raceflag) - integration — all integration tests pass against real Postgres 16 and Kafka 3.7
- vulnerabilities — zero HIGH or CRITICAL CVEs in Go modules or base image
- build reproducibility —
CGO_ENABLED=0,-trimpath, pinned Go version, pinned base image digest in Dockerfile - artifact integrity — SLSA Level 2 provenance attached to every image pushed to
main
PIPELINE.md in the repo root. When a gate fails and a developer asks "why does this pipeline fail on 78% coverage?", a documented gate policy ends the debate instantly. Document the rationale, not just the number: "80% is the minimum required to catch regressions in the store layer, which has no contract tests."
Common Failure Modes in Pipeline Design
- Missing
needschains — the Publish job runs even when security-scan failed because the author forgot to list it inneeds. Always list every gate job explicitly in the final stage'sneedsarray. - Hardcoded credentials in YAML — a common mistake when moving fast. Audit every new workflow for literal secrets before merging.
- Flaky integration tests — a test that depends on Kafka timing out occasionally will erode trust in CI until developers start re-running jobs without investigating. Fix flakes immediately; treat them as bugs, not annoyances.
- No caching strategy — re-downloading Go modules and rebuilding Docker layers from scratch on every run can add 3-4 minutes of pure wait time. Use
cache: trueinsetup-goandcache-from: type=ghain BuildKit. - Overly broad permissions — granting
contents: writeto every job because one job needs to push a tag. Scope permissions to the minimum each job needs.