DevOps Culture & Fundamentals

The DORA Metrics

18 min Lesson 5 of 28

The DORA Metrics

In 2014, the DevOps Research and Assessment (DORA) programme began the largest longitudinal study of software delivery performance ever conducted. By 2023 the dataset covered over 36,000 professionals across thousands of organisations worldwide. The study's central finding: four metrics reliably distinguish high-performing teams from low-performing ones, and those metrics predict both software delivery speed and organisational profitability.

These four metrics — Deployment Frequency, Lead Time for Changes, Mean Time to Restore (MTTR), and Change Failure Rate — are now the industry-standard vocabulary for measuring an engineering team's delivery capability. Every major cloud provider (Google, AWS, Azure) tracks them internally. If you are joining a mature DevOps team, you will be asked about them in your first week.

Key idea: The DORA metrics are outcome metrics, not activity metrics. They measure the results of your process, not whether your team is busy. A team that deploys once a month but also takes three days to recover from an outage is not a high performer — even if they work 70-hour weeks.

Metric 1 — Deployment Frequency

Definition: How often does your team deploy to production?

Deployment frequency is a proxy for batch size. Teams that deploy once a quarter are batching months of work into a single risky release. Teams that deploy multiple times per day are shipping tiny increments — each deployment is so small that reverting it is trivial.

DORA performance bands (2023 report):

Elite: On-demand (multiple deploys per day)
High: Between once per day and once per week
Medium: Between once per week and once per month
Low: Less than once per month

Google's frontend teams deploy to production hundreds of times per day. Amazon deploys to production every 11.7 seconds on average. This is only possible because deployments are automated, tested, and limited in blast radius by feature flags and canary releases.

How to measure it: Count the number of successful production deployments per day (or week). Use your CI/CD platform's built-in reporting, or query your deployment log directly.

# Query deployment frequency from a git-tag convention
# Convention: every production deploy creates a tag like "deploy/2025-06-10-1423"

git tag --list 'deploy/*' --sort=-version:refname \
  | awk -F/ '{print $2}' \
  | cut -c1-10 \
  | sort | uniq -c | sort -rn \
  | head -30
# Output: "  14 2025-06-10" means 14 deploys on that day

# GitHub Actions: emit a deployment event on every push to main
# .github/workflows/deploy.yml (excerpt)
on:
  push:
    branches: [main]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Deploy
        run: ./scripts/deploy.sh
      - name: Record deployment
        uses: chrnorm/deployment-action@v2
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          environment: production

Metric 2 — Lead Time for Changes

Definition: How long does it take from a commit being merged to that code running in production?

Lead time measures the speed of your feedback loop. A short lead time means developers know quickly whether their code works in production. A long lead time means bugs sit in flight for days or weeks before discovery.

DORA performance bands (2023):

Elite: Less than one hour
High: Between one day and one week
Medium: Between one week and one month
Low: More than six months

Lead time is not just a speed metric — it is a risk metric. A six-month lead time means your deployment contains six months of untested interactions between changes. A one-hour lead time means you are shipping one small, well-understood change at a time.

Lead time measured from the moment a commit is pushed until it is live in production — elite teams achieve this in under one hour.

Metric 3 — Mean Time to Restore (MTTR)

Definition: When a service incident occurs, how long does it take to restore service to users?

MTTR is the most direct measure of your team's operational resilience. It captures your ability to detect a problem, understand it, and fix or mitigate it quickly. The fix does not have to be a code change — it can be a rollback, a feature flag toggle, a config change, or a traffic reroute.

DORA performance bands (2023):

Elite: Less than one hour
High: Less than one day
Medium: Between one day and one week
Low: More than six months

Production pitfall: Many teams measure time-to-fix (the code patch) rather than time-to-restore (user impact resolved). If your service degrades at 14:00 and you deploy a fix at 16:00 but the rollback was already complete at 14:12, your MTTR is 12 minutes — not 2 hours. Always measure from impact-start to service-restored, not to root-cause-fixed.

Elite teams achieve sub-hour MTTR by maintaining three operational capabilities:

Fast detection: Alerting that fires within seconds of user-visible impact (not when a human notices).
Fast rollback: The ability to revert a deployment in under 5 minutes via a single CLI command or dashboard button.
Feature flags: The ability to disable a broken feature for 0% of users without a deployment at all.

# Kubernetes rollback — restores previous ReplicaSet in ~30 seconds
kubectl rollout undo deployment/api-server -n production

# Check rollout history first
kubectl rollout history deployment/api-server -n production
# REVISION  CHANGE-CAUSE
# 1         initial deploy
# 2         add payment retry logic      <-- currently running
# 3         add promo-code validation    <-- broken, rolling back from this

# One-liner: undo to a specific revision
kubectl rollout undo deployment/api-server --to-revision=2 -n production

# Verify restore
kubectl rollout status deployment/api-server -n production
# Waiting for deployment "api-server" rollout to finish: 2 out of 3 new replicas have been updated...
# deployment "api-server" successfully rolled out

Metric 4 — Change Failure Rate

Definition: What percentage of deployments to production result in a degraded service or require a hotfix/rollback?

Change failure rate measures the quality of your delivery process. It is the ratio of bad deployments to total deployments. Note that a bad deployment is one that causes user-visible impact — not one that has a bug discovered in a code review.

DORA performance bands (2023):

Elite / High: 0–15%
Medium: 16–30%
Low: 16–30% (same band, but correlates with lower frequency)

The most important insight from DORA research: high deployment frequency does not cause high failure rate. Elite performers deploy most often and have the lowest failure rate. The mechanism is batch size — small, frequent changes are easier to test, review, and reason about than large quarterly releases.

The four DORA metrics split across two dimensions — deployment throughput (speed) and operational stability (reliability).

Collecting DORA Metrics in Practice

Most organisations instrument DORA metrics by combining three data sources: their CI/CD platform (deployment frequency and lead time), their incident management tool (MTTR), and their deployment log or change management system (change failure rate). The important thing is to automate the collection — manually counting incidents at the end of a quarter produces misleading numbers.

# Example: calculate DORA metrics from a structured incident log
# incidents.csv format:
# deployment_id,deploy_ts,incident_start_ts,incident_end_ts,caused_by_deploy
# d1001,2025-06-01T09:00:00Z,,, false
# d1002,2025-06-01T14:00:00Z,2025-06-01T14:05:00Z,2025-06-01T14:52:00Z,true
# d1003,2025-06-02T10:30:00Z,,,false

python3 - <<'EOF'
import csv, datetime

deployments = []
incidents   = []

with open('incidents.csv') as f:
    for row in csv.DictReader(f):
        deployments.append(row)
        if row['caused_by_deploy'] == 'true':
            start = datetime.datetime.fromisoformat(row['incident_start_ts'])
            end   = datetime.datetime.fromisoformat(row['incident_end_ts'])
            incidents.append((end - start).total_seconds() / 60)

total = len(deployments)
failures = len(incidents)
cfr  = (failures / total) * 100 if total else 0
mttr = sum(incidents) / len(incidents) if incidents else 0

print(f"Total deployments : {total}")
print(f"Failed deployments: {failures}")
print(f"Change Failure Rate: {cfr:.1f}%")
print(f"Mean MTTR          : {mttr:.1f} min")
EOF

The Speed vs Stability Trade-off — A Myth

A common objection to aiming for high deployment frequency is "we cannot move faster because stability will suffer." DORA data consistently refutes this. Throughput (deployment frequency, lead time) and stability (MTTR, change failure rate) are positively correlated — teams that score high on one tend to score high on the other. The mechanism is again batch size: small changes are inherently less risky.

Practical starting point: If your team is in the Medium or Low band, do not try to optimise all four metrics at once. Pick the biggest bottleneck. For most teams that is lead time — a slow CI pipeline (20+ minutes) or a manual approval gate that waits for a weekly change-advisory board. Cutting lead time from 5 days to 4 hours usually improves deployment frequency, MTTR, and change failure rate simultaneously.

In Lesson 6 you will look at how these metrics connect to the broader story of infrastructure evolution — understanding where your system runs is as important as understanding how well you deliver changes to it.