GitHub Actions in Depth

Project: A Complete CI/CD Workflow

18 min Lesson 10 of 30

Project: A Complete CI/CD Workflow

Every lesson in this tutorial has been a building block. Now you assemble them into a production-grade CI/CD pipeline — the kind that ships code at companies like Shopify, Stripe, and GitHub itself. This lesson walks you through designing, writing, and operating a complete workflow that builds, tests, packages, and deploys a containerised Node.js API to a cloud environment, gate-keeping each stage with automated quality checks and approval controls.

The Target Architecture

The app is a REST API packaged as a Docker image, pushed to a container registry, and deployed to a Kubernetes cluster. The pipeline enforces this progression: code must pass static analysis and unit tests before an image is built; the image must be scanned for CVEs before it is pushed; a staging deployment must succeed and a smoke-test must pass before production is unlocked; and production requires a named approver.

Complete CI/CD pipeline stages Lint & Test Unit tests Build Image docker build Scan & Push Trivy + GHCR Deploy Staging smoke-test gate Approve Manual gate Deploy Prod kubectl rollout Push / PR Trigger on:release
End-to-end CI/CD pipeline: every stage is a gated job; production requires a manual approval.

The Complete Workflow File

All five stages live in a single workflow file. The needs key enforces the dependency chain; the environment: production block adds the approval gate. Notice that the IMAGE_TAG is derived from the Git SHA — immutable, traceable, impossible to accidentally overwrite.

# .github/workflows/cicd.yml name: CI/CD Pipeline on: push: branches: [main] pull_request: branches: [main] release: types: [published] env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: # ── 1. LINT & TEST ────────────────────────────────────────────── test: name: Lint & Unit Tests runs-on: ubuntu-24.04 steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: "20" cache: "npm" - run: npm ci - name: Lint run: npm run lint - name: Unit tests with coverage run: npm test -- --coverage --coverageThreshold='{"global":{"lines":80}}' - uses: actions/upload-artifact@v4 if: always() with: name: coverage-report path: coverage/ retention-days: 7 # ── 2. BUILD IMAGE ──────────────────────────────────────────────── build-image: name: Build Docker Image runs-on: ubuntu-24.04 needs: test outputs: image-digest: ${{ steps.build.outputs.digest }} image-tag: ${{ steps.meta.outputs.tags }} permissions: contents: read packages: write steps: - uses: actions/checkout@v4 - uses: docker/setup-buildx-action@v3 - uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=sha,prefix=sha-,format=short type=ref,event=branch type=semver,pattern={{version}} - name: Build & push (cache-optimised) id: build uses: docker/build-push-action@v5 with: context: . push: ${{ github.event_name != 'pull_request' }} tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max provenance: true # SLSA level-1 attestation sbom: true # Software Bill of Materials # ── 3. SCAN & SIGN ────────────────────────────────────────────── scan: name: CVE Scan runs-on: ubuntu-24.04 needs: build-image permissions: contents: read packages: read security-events: write steps: - name: Run Trivy vulnerability scanner uses: aquasecurity/trivy-action@master with: image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-image.outputs.image-digest }} format: sarif output: trivy-results.sarif severity: CRITICAL,HIGH exit-code: "1" # fail the job on any CRITICAL or HIGH - uses: github/codeql-action/upload-sarif@v3 if: always() with: sarif_file: trivy-results.sarif # ── 4. DEPLOY STAGING ──────────────────────────────────────────── deploy-staging: name: Deploy to Staging runs-on: ubuntu-24.04 needs: scan environment: staging if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v4 - name: Install kubectl uses: azure/setup-kubectl@v4 with: version: "v1.30.0" - name: Authenticate to cluster run: | mkdir -p ~/.kube echo "${{ secrets.STAGING_KUBECONFIG }}" | base64 -d > ~/.kube/config chmod 600 ~/.kube/config - name: Rolling update run: | IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-image.outputs.image-digest }}" kubectl set image deployment/api api="$IMAGE" -n staging kubectl rollout status deployment/api -n staging --timeout=120s - name: Smoke test run: | STAGING_URL="https://staging.api.example.com" for i in 1 2 3 4 5; do STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$STAGING_URL/healthz") [ "$STATUS" = "200" ] && echo "Smoke test passed" && exit 0 echo "Attempt $i: got $STATUS, retrying in 10s..." sleep 10 done echo "Smoke test failed after 5 attempts" && exit 1 # ── 5. DEPLOY PRODUCTION ───────────────────────────────────────── deploy-production: name: Deploy to Production runs-on: ubuntu-24.04 needs: deploy-staging environment: production # <-- manual approval gate if: github.event_name == 'release' steps: - uses: actions/checkout@v4 - uses: azure/setup-kubectl@v4 with: version: "v1.30.0" - name: Authenticate to production cluster run: | mkdir -p ~/.kube echo "${{ secrets.PROD_KUBECONFIG }}" | base64 -d > ~/.kube/config chmod 600 ~/.kube/config - name: Blue/green cutover run: | IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-image.outputs.image-digest }}" kubectl set image deployment/api api="$IMAGE" -n production kubectl rollout status deployment/api -n production --timeout=300s - name: Tag deployment in Datadog env: DD_API_KEY: ${{ secrets.DATADOG_API_KEY }} run: | curl -s -X POST "https://api.datadoghq.com/api/v1/events" \ -H "DD-API-KEY: $DD_API_KEY" \ -H "Content-Type: application/json" \ -d "{\"title\":\"Deployed ${{ github.ref_name }}\",\"text\":\"SHA ${{ github.sha }}\",\"tags\":[\"env:production\"]}"

The Dockerfile That Makes It Work

The pipeline is only as good as the Dockerfile it builds. Use a multi-stage build to keep the final image lean, and never run as root in production. The --chown and USER node directives are non-negotiable at big-tech companies.

# Dockerfile # Stage 1 — install dependencies (layer cache-friendly) FROM node:20-alpine AS deps WORKDIR /app COPY package*.json ./ RUN npm ci --omit=dev # Stage 2 — build (only if your app has a compile step) FROM node:20-alpine AS builder WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build # Stage 3 — minimal production image FROM node:20-alpine AS runner WORKDIR /app # Security: non-root user RUN addgroup --system --gid 1001 nodejs \ && adduser --system --uid 1001 nodeuser COPY --from=builder --chown=nodeuser:nodejs /app/dist ./dist COPY --from=deps --chown=nodeuser:nodejs /app/node_modules ./node_modules USER nodeuser EXPOSE 3000 HEALTHCHECK --interval=15s --timeout=5s --start-period=10s --retries=3 \ CMD wget -qO- http://localhost:3000/healthz || exit 1 CMD ["node", "dist/server.js"]

Key Design Decisions Explained

Image identity via digest, not tag

The build-image job surfaces the image digest (a SHA-256 content hash) as an output. Every downstream job references the image by that digest, not by a mutable tag like :latest. This guarantees that the exact binary deployed to staging is the exact binary that gets to production — no tag-overwrite races, no "works on my machine" drift.

CVE scanning as a hard gate

Trivy runs with exit-code: 1, which means any CRITICAL or HIGH CVE will fail the scan job and prevent both the staging and production deployments from starting. The SARIF results are uploaded to GitHub's Security tab so engineers can triage without leaving GitHub.

The approval environment

The environment: production declaration links to a GitHub Environment configured with Required Reviewers. When the workflow reaches that job it pauses and GitHub sends a notification to the listed reviewers. No code changes are needed to add or remove approvers — it is all managed in the repository Settings UI and is fully audited.

The if condition on deploy-production is critical. Without if: github.event_name == 'release', every merge to main would queue a production deployment waiting for approval. Only a published GitHub Release should unlock the production job. Staging deploys on every main merge; production deploys only on a release event.

Rollback Strategy

Every image pushed is immutable and tagged by SHA. Rolling back is a one-liner: find the previous successful run in the Actions UI, copy its digest, re-run the deploy-production job with that value, or simply:

# Rollback: set the deployment image back to the previous known-good digest kubectl set image deployment/api \ api=ghcr.io/your-org/your-repo@sha256:<previous-digest> \ -n production kubectl rollout status deployment/api -n production --timeout=300s # Verify kubectl get pods -n production -l app=api -o wide
Pin Actions to a commit SHA in production pipelines. Using actions/checkout@v4 is convenient but actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 is immutable. A compromised action tag can be updated by an attacker without changing the version string; a pinned SHA cannot. Tools like Dependabot and Renovate keep pins up to date automatically.

Common Production Failure Modes

  • Deployment timeout before Pods are healthy — your readiness probe is failing; check the app logs with kubectl logs -n staging -l app=api --since=2m before blaming the pipeline.
  • Smoke test flaps — the retry loop in the example handles transient load-balancer warm-up; adjust the sleep interval for your infrastructure cold-start time.
  • Trivy blocks on a false positive — use .trivyignore to suppress specific CVE IDs with a comment explaining the decision and a review-by date.
  • GITHUB_TOKEN lacks packages: write — the permission must be declared at the job level, not just assumed. It was added explicitly in build-image.
  • Stale kubeconfig — rotate STAGING_KUBECONFIG and PROD_KUBECONFIG secrets when service-account tokens expire. Set a calendar reminder or use OIDC federation instead (covered in lesson 8).
Never put cluster credentials in workflow environment variables visible in logs. Always decode from a base64 secret directly to a file (echo "$SECRET" | base64 -d > ~/.kube/config) and set strict permissions (chmod 600). GitHub masks secret values in logs, but an explicit echo $SECRET will still partially leak in some shells. The pattern in this lesson is the safe approach.

What to Extend Next

This pipeline is a solid foundation. In a real company codebase you would add: integration and end-to-end tests as parallel jobs between test and build-image; database migration jobs with a dry-run gate; Slack / PagerDuty notifications on failure using if: failure() steps; and DORA metric emission (deployment frequency, lead time) to your observability platform. The architecture scales because every concern is its own job — adding a new gate is a matter of inserting a job with the right needs chain.