GitHub Actions in Depth

Workflows, Jobs & Steps

18 min Lesson 1 of 30

Workflows, Jobs & Steps

GitHub Actions is the CI/CD platform built directly into GitHub. Every major tech company — Google, Microsoft, Stripe, Shopify — uses it or an equivalent. To use it effectively, you must understand its three-layer execution model: workflows contain jobs, and jobs contain steps. Confuse these layers and you will write pipelines that are brittle, slow, and hard to debug in production.

The Anatomy of a Workflow File

A workflow is a YAML file stored in your repository under .github/workflows/. GitHub scans that directory automatically — any .yml file there becomes a registered workflow. The top-level keys every workflow must have are name, on, and jobs.

Here is a production-grade workflow that demonstrates every major structural element. Read through it carefully before we dissect each piece.

# .github/workflows/ci.yml # Full example: build, lint, test a Python service on every PR to main. name: CI Pipeline on: push: branches: ["main", "release/**"] paths-ignore: - "docs/**" - "*.md" pull_request: branches: ["main"] types: [opened, synchronize, reopened] workflow_dispatch: inputs: run_slow_tests: description: "Include slow integration tests?" required: false default: "false" type: boolean concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true jobs: lint: name: Lint & Static Analysis runs-on: ubuntu-24.04 steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.12" cache: "pip" - run: pip install ruff mypy - run: ruff check . - run: mypy src/ test: name: Unit Tests (Python ${{ matrix.python-version }}) runs-on: ubuntu-24.04 needs: lint strategy: fail-fast: false matrix: python-version: ["3.11", "3.12"] steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} cache: "pip" - run: pip install -e ".[dev]" - run: pytest -x --tb=short -q env: DATABASE_URL: sqlite:///test.db

Events and Triggers: The on Key

The on key defines what causes the workflow to run. GitHub Actions supports over 35 event types. Choosing the wrong events is one of the most common pipeline design mistakes — workflows that run too broadly waste minutes and burn free quota; those that run too narrowly miss regressions.

The most important events in daily engineering work are:

  • push — fires when commits are pushed to a branch. Use branches filters to avoid triggering on every branch in the repository. Add paths-ignore to skip documentation-only changes (your CI does not need to run because someone fixed a typo in README).
  • pull_request — fires when a PR is opened, updated, or synchronized. This is the most important event for CI: it gates the merge. Always pair it with a branches filter so only PRs targeting important branches trigger expensive runs.
  • workflow_dispatch — fires when a human manually clicks "Run workflow" in the GitHub UI. Use this for on-demand operations: release cuts, one-off migrations, running slow test suites. The inputs block defines a form that GitHub renders in the UI.
  • schedule — fires on a cron schedule. Use for nightly security scans, dependency audits, or any check that should run independently of code changes. Timezone is always UTC.
Production pitfall — concurrency storms: On a busy repository, a single push to a PR branch can trigger multiple concurrent workflow runs (the push event AND the pull_request event). Without the concurrency block, you will have five runs queued for the same PR, wasting runners and confusing status checks. Always add a concurrency group keyed to github.workflow + github.ref, with cancel-in-progress: true. This ensures only the latest run for a given branch is active, and older ones are cancelled automatically.
GitHub Actions Three-Layer Execution Model Workflow (.github/workflows/ci.yml) triggered by: on: push / pull_request Job: lint runs-on: ubuntu-24.04 Step: checkout Step: setup-python Step: pip install ruff mypy Step: ruff check . Step: mypy src/ needs: lint Job: test (matrix: py 3.11, 3.12) runs-on: ubuntu-24.04 — two parallel runners Python 3.11 Step: checkout Step: setup-python 3.11 Step: pip install Step: pytest PASS Python 3.12 Step: checkout Step: setup-python 3.12 Step: pip install Step: pytest PASS
A workflow contains jobs. Jobs run on isolated runners. Steps inside a job share the same runner filesystem. The needs key makes test wait for lint to succeed; the matrix spins two parallel runners for Python 3.11 and 3.12.

Jobs: The Unit of Isolation

A job is the unit of isolation in GitHub Actions. Each job runs in a fresh virtual machine — nothing from one job's filesystem, environment variables, or installed tools carries over to another job unless you explicitly pass it via artifacts. This is a deliberate design choice: it prevents hidden coupling between CI stages.

Key job-level configuration fields:

  • runs-on — the runner OS and version. Always pin to a specific version like ubuntu-24.04 rather than the mutable ubuntu-latest tag. GitHub changes what ubuntu-latest points to without warning, causing silent environment regressions in your pipeline.
  • needs — an array of job IDs that must complete successfully before this job starts. By default, all jobs in a workflow run in parallel. Use needs to express dependencies: test should not run if lint failed; deploy should not run if tests failed.
  • environment — links the job to a GitHub Environment (covered in a later lesson), which enables deployment protection rules and secrets scoped to specific environments (staging, production).
  • outputs — allows a job to publish key-value pairs that downstream jobs can read. Used to pass build artifact names, computed version numbers, or test results between jobs.
  • timeout-minutes — the maximum time a job is allowed to run before GitHub cancels it. The default is 360 minutes (6 hours). Always set a tighter bound — a hung integration test should not burn six hours of runner time before it is killed.
Key idea — job isolation vs step sharing: Steps inside a job share the same runner VM, the same working directory, and the same environment variable state. This is why you install dependencies in one step and run tests in another — both see the same node_modules/ or venv/. But job B cannot see what job A wrote to disk. If job A builds a Docker image and job B needs to push it, job A must upload the image as an artifact (or push it to a registry) and job B downloads it.

Steps: Where the Work Happens

A step is a single unit of work inside a job. It runs as a shell command (run) or calls a pre-built action (uses). Steps execute sequentially within a job, in the order they are defined — there is no parallelism at the step level.

Every step can optionally have:

  • name — the label shown in the GitHub UI and in log output. Always write meaningful names — "Run tests" is marginally useful, "Unit tests (src/)" is helpful, "pytest src/ -x --tb=short -q" is exactly what you need when debugging a failure at 2am.
  • id — a stable identifier used to reference this step's outputs and outcome from later steps in the same job.
  • if — a conditional expression that determines whether this step runs. Commonly used for steps that should only execute on failure (if: failure()) or on specific branches.
  • env — step-scoped environment variables, overriding job-level or workflow-level env for this step only. Never put secrets directly in env values — always reference them via ${{ secrets.MY_SECRET }}.
  • continue-on-error — if true, the job continues even if this step fails. Useful for optional diagnostics like coverage reporting that should not block the pipeline.
# Advanced step patterns: outputs, conditionals, and failure handlers jobs: build: runs-on: ubuntu-24.04 outputs: image_tag: ${{ steps.meta.outputs.version }} steps: - uses: actions/checkout@v4 # Step with an id so its output can be referenced - name: Compute image tag id: meta run: | VERSION=$(git describe --tags --always --dirty) echo "version=${VERSION}" >> "$GITHUB_OUTPUT" - name: Build Docker image run: | docker build \ --tag "myapp:${{ steps.meta.outputs.version }}" \ --build-arg GIT_SHA=${{ github.sha }} \ . # This step only runs if the job fails — diagnostic artifact upload - name: Upload build logs on failure if: failure() uses: actions/upload-artifact@v4 with: name: build-logs path: /tmp/build-*.log retention-days: 7
Pro practice — use GITHUB_OUTPUT, not set-output: The old echo "::set-output name=key::value" syntax is deprecated and disabled on new runners. Always write outputs using the environment file approach: echo "key=value" >> "$GITHUB_OUTPUT". Similarly, use echo "MY_VAR=value" >> "$GITHUB_ENV" to set environment variables that persist across steps in the same job. Both GITHUB_OUTPUT and GITHUB_ENV are temporary files created by GitHub; appending to them is the approved, injection-safe mechanism.

The Dependency Graph: Designing Job Order

At big-tech scale, the job dependency graph is the single biggest lever for pipeline speed. The mental model is a directed acyclic graph (DAG): jobs with no dependencies run in parallel; jobs with needs form chains. A well-designed pipeline runs as much as possible in parallel, and chains only what is truly dependent.

A common production pattern for a web service:

  • Parallel first stage: lint, type-check, dependency-audit — all run simultaneously with no dependencies. They are all cheap (under 2 minutes) and entirely independent.
  • Second stage (needs first stage): unit-tests — runs only if lint and type-check pass. Running tests after a lint failure is pointless noise.
  • Third stage (needs unit-tests): build — compile and package the artifact.
  • Fourth stage (needs build): integration-tests — spin up dependencies (database, message broker) and test the packaged artifact end-to-end.
  • Fifth stage (needs integration-tests, on main only): deploy-staging — deploy to a staging environment, gated by a GitHub Environment approval.
Common failure mode — sequential everything: Engineers new to GitHub Actions often put every logical step in a single job, running sequentially. This looks simple but has two serious problems: (1) a slow test step blocks all later steps even when earlier checks like lint would have caught the issue faster; (2) a single job failure gives you no signal about which class of check failed. Split into jobs by concern and let the DAG give you parallel speed and precise failure attribution.

Common Production Patterns

A few workflow-level settings that every production pipeline should include:

# Production workflow skeleton — settings that should be in every pipeline name: CI/CD on: push: branches: ["main"] pull_request: branches: ["main"] # Cancel any in-progress run for the same branch when a new one starts. # Prevents queuing five runs for five quick-succession commits to the same PR. concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true # Principle of least privilege: default to read-only token permissions. # Grant write only where a specific job requires it (e.g., deploy, package publish). permissions: contents: read jobs: test: runs-on: ubuntu-24.04 timeout-minutes: 20 # Never leave the default 360 minutes. permissions: contents: read # Job-level override; only what this job actually needs. steps: - uses: actions/checkout@v4 with: # Shallow clone (default depth 1) is fastest for CI. # Increase only if you need git history (e.g., git describe for versioning). fetch-depth: 1 - name: Run tests run: make test

Every setting in that skeleton has a reason. concurrency prevents runner waste. permissions: contents: read follows the principle of least privilege — the default GITHUB_TOKEN is far too powerful for a test job, and a compromised action in your dependency chain should not be able to push to your repository. timeout-minutes: 20 is a safety net against hung processes that would otherwise consume runner capacity for hours.

Understanding workflows, jobs, and steps at this depth is the foundation for everything that follows in this tutorial: matrix builds, caching, reusable workflows, OIDC authentication, and secure deployments all build directly on this three-layer model. Get the structure right, and the rest of the tutorial makes sense. Get it wrong, and you will fight the tool at every turn.