Performance & Load Testing

k6 in Practice

18 min Lesson 3 of 28

k6 in Practice

Lesson 2 established load testing theory — virtual users, ramp shapes, percentile math. This lesson moves to the tool you will spend the most time in: k6. Originally built by Load Impact, now a Grafana Labs project, k6 is the industry standard for developer-owned load testing. It is written in Go (so it can hold hundreds of thousands of VUs with modest RAM), scripted in JavaScript (ES2015+), and designed from the ground up to live inside a CI pipeline. At Grafana, Shopify, and many Tier-1 SRE teams, k6 scripts are versioned alongside service code — every PR gate includes a smoke test and every release candidate runs a full soak.

k6 is not a browser automation tool. It generates HTTP/WebSocket/gRPC traffic at the protocol level. It does not execute JavaScript in a browser. If you need real-browser load testing (for SPAs that do heavy client-side rendering) use k6 Browser (xk6-browser extension). For pure API and backend load testing, the default engine is what you want.

Script Structure: the Anatomy of a k6 Test

Every k6 script exports a default function that is the VU body — the code each virtual user executes in a loop. The script also has an init context (module-level code) that runs once per VU before the test starts, and optional lifecycle hooks: setup() (runs once before all VUs start) and teardown(data) (runs once after all VUs finish).

// checkout-flow.js — production-grade k6 script skeleton import http from 'k6/http'; import { check, sleep } from 'k6'; import { Trend, Rate, Counter } from 'k6/metrics'; // --- Custom metrics (defined at init time, shared across VUs) --- const checkoutLatency = new Trend('checkout_latency_ms', true); // true = high-resolution const checkoutErrors = new Rate('checkout_error_rate'); const checkoutCount = new Counter('checkouts_attempted'); // --- Thresholds and stages (the test contract) --- export const options = { stages: [ { duration: '2m', target: 50 }, // ramp up to 50 VUs { duration: '5m', target: 50 }, // hold steady load { duration: '2m', target: 200 }, // spike to 200 VUs { duration: '5m', target: 200 }, // hold spike { duration: '2m', target: 0 }, // ramp down ], thresholds: { http_req_duration: ['p(95)<500', 'p(99)<1500'], // SLO gate checkout_error_rate: ['rate<0.01'], // <1% errors http_req_failed: ['rate<0.005'], // k6 built-in failure rate }, }; // setup() runs ONCE before VUs start; its return value is passed to default() and teardown() export function setup() { const res = http.post('https://api.example.com/auth/token', JSON.stringify({ client_id: 'load-test-bot', client_secret: __ENV.API_SECRET, // inject secrets via env, never hardcode }), { headers: { 'Content-Type': 'application/json' } }); check(res, { 'auth OK': (r) => r.status === 200 }); return { token: res.json('access_token') }; } // default() is the VU loop — called repeatedly for each VU export default function (data) { const headers = { Authorization: `Bearer ${data.token}`, 'Content-Type': 'application/json', }; const start = Date.now(); const res = http.post('https://api.example.com/checkout', JSON.stringify({ cart_id: `cart-${__VU}-${__ITER}`, // __VU = VU number, __ITER = iteration count promo_code: 'LOAD_TEST', }), { headers }); checkoutLatency.add(Date.now() - start); checkoutCount.add(1); const ok = check(res, { 'status 200': (r) => r.status === 200, 'order_id present': (r) => r.json('order_id') !== undefined, }); checkoutErrors.add(!ok); sleep(1); // think time between iterations (model real user pace) } export function teardown(data) { // revoke the test token to leave the auth system clean http.del('https://api.example.com/auth/token', null, { headers: { Authorization: `Bearer ${data.token}` }, }); }

Stages vs. Scenarios: Choosing the Right Shape

The stages array is the quick way to define a single VU ramp profile. But real production traffic is not a single pool of identical users. The scenarios API gives you independent executor pools, each with its own VU count, ramp shape, arrival rate, and script function — composable into a realistic load model.

The key executors and when to use them:

  • ramping-vus — the classic ramp. You control VU count over time. Good for soak tests and spike drills. The default when you write stages.
  • constant-arrival-rate — you specify requests per second, not VU count. k6 spins up as many VUs as needed. Use this to model a fixed inbound request rate (e.g., 500 RPS from a load balancer) independently of how fast or slow your service responds. This is the correct executor for SLO gate tests — you want to assert behavior at a known RPS, not at an arbitrary VU count.
  • ramping-arrival-rate — like ramping-vus but in RPS. Good for finding the throughput cliff.
  • per-vu-iterations — each VU runs exactly N iterations. Useful for data-driven tests where each VU needs a unique dataset row.
// multi-scenario.js — composing read traffic + write traffic + admin traffic export const options = { scenarios: { // Scenario 1: high-volume read traffic at constant arrival rate browse_products: { executor: 'constant-arrival-rate', rate: 300, // 300 RPS timeUnit: '1s', duration: '10m', preAllocatedVUs: 50, // pre-allocate to avoid cold-start latency maxVUs: 200, // allow k6 to auto-scale if 300 RPS needs more VUs exec: 'browseFlow', // points to an exported function in this file }, // Scenario 2: lower-volume write traffic create_orders: { executor: 'ramping-arrival-rate', startRate: 10, timeUnit: '1s', stages: [ { target: 10, duration: '2m' }, { target: 50, duration: '5m' }, { target: 10, duration: '2m' }, ], preAllocatedVUs: 20, maxVUs: 100, exec: 'checkoutFlow', }, // Scenario 3: admin polling at low constant rate admin_reports: { executor: 'constant-vus', vus: 5, duration: '10m', exec: 'adminFlow', startTime: '30s', // start 30s after other scenarios to let warm-up finish }, }, thresholds: { 'http_req_duration{scenario:browse_products}': ['p(95)<200'], 'http_req_duration{scenario:create_orders}': ['p(95)<500'], 'http_req_failed': ['rate<0.005'], }, }; export function browseFlow() { /* ... */ } export function checkoutFlow() { /* ... */ } export function adminFlow() { /* ... */ }
k6 Scenarios: Composing Multiple Executor Pools 0 2m 5m 8m 10m time → browse_products constant-arrival-rate 300 RPS (up to 200 VUs auto-scaled) create_orders ramping-arrival-rate 10→50→10 RPS (up to 100 VUs) admin_reports constant-vus 5 VUs (startTime 30s) +30s Per-scenario thresholds p95 < 200ms (browse) · p95 < 500ms (orders)
Three independent k6 scenario executors composing a realistic production traffic mix: high-volume reads at constant RPS, ramping writes, and low-rate admin polling with a delayed start.

Thresholds: Making Tests Self-Enforcing

A load test without thresholds is just data collection. Thresholds are the executable SLO: k6 exits with a non-zero status code if any threshold is breached, which means your CI pipeline fails and the release is blocked. This is the most important feature k6 offers — it turns a performance test into a correctness gate.

Threshold expressions support any built-in or custom metric with operators p(N), avg, min, max, rate, count:

  • 'p(95)<500' — 95th-percentile response time under 500ms
  • 'p(99)<2000' — 99th-percentile under 2s (the long-tail SLO)
  • 'rate<0.01' — less than 1% error rate on a Rate metric
  • 'count>1000' — at least 1,000 successful completions (useful for data-coverage assertions)

You can attach an abortOnFail: true flag and a delayAbortEval duration to a threshold so k6 kills the test early once you know it is already failing — avoiding burning load on a system that is already down.

export const options = { thresholds: { // Standard SLO gates (fail CI if breached) http_req_duration: [ { threshold: 'p(95)<500', abortOnFail: true, delayAbortEval: '1m' }, { threshold: 'p(99)<2000' }, ], http_req_failed: [ { threshold: 'rate<0.005', abortOnFail: true, delayAbortEval: '30s' }, ], // Custom metric thresholds — per-endpoint breakdown 'http_req_duration{url:https://api.example.com/checkout}': ['p(95)<600'], 'http_req_duration{url:https://api.example.com/catalog}': ['p(95)<150'], // Custom business metric checkout_error_rate: ['rate<0.01'], }, // noVUConnectionReuse: false — default; keep this false for realistic keep-alive behavior // insecureSkipTLSVerify: false — never set true in real tests; you want to catch cert issues };
Tag your HTTP requests by name, not URL. When a URL contains dynamic IDs like /orders/12345, k6 creates a separate metric for every unique URL. Your dashboard becomes noise. Use the { tags: { name: 'GET /orders/:id' } } option on each request, or set a URL grouping pattern with http.url`https://api.example.com/orders/${orderId}`. This is critical for threshold targeting and Grafana dashboards to be meaningful.

Running k6: Local, Distributed, and in CI

For local development and debugging, a single-machine run is all you need. For load levels above roughly 2,000–5,000 VUs (the typical single-machine ceiling depending on test complexity and network stack), you distribute across multiple nodes with k6 run --execution-segment or the Kubernetes operator.

# --- Local run --- k6 run --vus 50 --duration 5m checkout-flow.js # Pass secrets via environment (never bake into script) k6 run -e API_SECRET=$API_SECRET checkout-flow.js # --- Output to InfluxDB + Grafana dashboard (standard SRE setup) --- k6 run --out influxdb=http://influxdb:8086/k6 checkout-flow.js # --- Output to Prometheus remote-write (modern stack) --- K6_PROMETHEUS_RW_SERVER_URL=http://prometheus:9090/api/v1/write \ k6 run --out experimental-prometheus-rw checkout-flow.js # --- Distributed run across 3 nodes (each handles 1/3 of VUs) --- # Node 1: k6 run --execution-segment "0:1/3" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js # Node 2: k6 run --execution-segment "1/3:2/3" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js # Node 3: k6 run --execution-segment "2/3:1" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js # --- GitHub Actions CI gate (full pipeline) --- # .github/workflows/perf.yml (fragment) # - name: Run k6 load test # uses: grafana/k6-action@v0.3.1 # with: # filename: tests/load/checkout-flow.js # flags: --out influxdb=http://influxdb:8086/k6 # env: # API_SECRET: ${{ secrets.API_SECRET }}
Do not load-test production from a single laptop. If your internet connection has 50 Mbps upload and your API responses are 10 KB each, you hit a network ceiling at ~500 RPS before the server is stressed at all. Run load generators from inside the same VPC as the target, on machines with sufficient network bandwidth. The results from a network-bottlenecked test are meaningless — they measure your connection, not your service.

Realistic Data: Avoiding the Cache-Warming Trap

A load test that hammers a single product ID will warm your Redis cache on the first request and measure cache-hit latency for the remaining 99.9% of iterations. That tells you nothing about uncached-path performance. Production traffic hits thousands of distinct IDs. Use SharedArray to load a realistic dataset once (not per-VU) and spread the load across all IDs.

import { SharedArray } from 'k6/data'; import { randomItem } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js'; // SharedArray is loaded ONCE at init time and shared read-only across all VUs // No per-VU memory overhead — critical when running 10k+ VUs const products = new SharedArray('products', function () { return JSON.parse(open('./data/products.json')); // 10,000 product IDs }); const users = new SharedArray('users', function () { return JSON.parse(open('./data/users.json')); // 5,000 test user accounts }); export default function () { const product = randomItem(products); const user = randomItem(users); const res = http.get(`https://api.example.com/products/${product.id}`, { tags: { name: 'GET /products/:id' }, // group metric regardless of ID }); check(res, { 'status 200': (r) => r.status === 200 }); sleep(Math.random() * 2 + 0.5); // random think time 0.5–2.5s — not a fixed 1s }

Common Production Failure Modes

Knowing the failure patterns will save you from spending hours on invalid test results:

  • Coordinated omission: If your VU sleeps while waiting for a slow response, the next iteration starts later — and slow responses appear less often in your percentiles. Use constant-arrival-rate executor to decouple arrival rate from service latency and measure the true queuing behavior.
  • TLS handshake overhead dominating: Short-duration tests (under 2 minutes) with high VU counts can show artificially high latency because TLS handshakes dominate. Ensure http.setResponseCallback is not counting connection setup in your business-logic metric, and run tests long enough for connection pools to stabilize.
  • DNS resolution bottleneck: When every VU resolves DNS independently, a thousand VUs can DOS your internal DNS server. Use --dns ttl=60s to cache DNS for the test duration.
  • Memory leak revealed by long soak: A 5-minute stress test passes; a 2-hour soak exposes a slow memory leak in your service's connection pool. Always run a soak test before any major release.