Analyzing & Reporting Results
Analyzing & Reporting Results
Collecting load-test data is the easy part. The hard part is extracting the three answers that every stakeholder actually needs: where does the system break?, how close to that limit are we today?, and how much runway is left before we need to add capacity or fix the bottleneck? Those three questions map directly to the concepts of saturation points, percentile curves, and capacity headroom. Understanding how to read and communicate these correctly is the difference between a performance report that drives decisions and one that gets filed and forgotten.
Reading Percentile Curves
A percentile latency curve (often called a "latency CDF" or "latency-vs-load" curve) plots the pXX latency metric against increasing offered load (RPS or concurrency). The shape of the curve tells you far more than a single number ever can.
The canonical shape at low load is flat: p50, p95, and p99 are all close together and stable. As you approach the service's natural throughput ceiling, the curve begins to diverge — p99 climbs first and fastest because high percentiles are the first to absorb queueing delay. This divergence is the earliest warning of saturation and it appears well before error rates increase. By the time you see 5xx errors climbing, you are already well past the safe operating zone.
Key observations from the shape of these curves:
- Flat region (headroom zone): All percentiles are stable and close together. The system has spare capacity; adding load does not meaningfully increase latency. This is where you want to operate in steady state.
- Knee / inflection point: The point where p99 begins to diverge sharply from p50 and p95. This is the onset of queueing — the system is approaching its processing limit and requests are waiting. This is the saturation point.
- Divergence magnitude: If p99 is 3–5x p50, that is expected variance from garbage collection pauses and OS scheduling. If it is 20x or more at moderate load, you have a structural problem: lock contention, database connection pool exhaustion, or a downstream service that is single-threaded.
Identifying Saturation Points
The saturation point is the offered load at which the system transitions from linear (latency roughly constant) to super-linear (latency grows faster than load). Correctly identifying it requires looking at multiple signals simultaneously, not just latency.
In Grafana or a k6 dashboard, open four panels side by side during a ramp-up test: p99 latency, active CPU percentage, connection wait times (for databases: pool wait; for web servers: accept queue depth), and error rate. The saturation point is the load value where two or more of these metrics begin their upward inflection. Single-metric anomalies are often noise; correlated inflection is structural saturation.
After the run, extract the saturation inflection point programmatically rather than eyeballing it:
Measuring and Communicating Capacity Headroom
Capacity headroom is the gap between your current peak observed load and your saturation point, expressed as a percentage of the saturation point. A headroom of 40% means you can absorb 40% more traffic before performance degrades. A headroom of 5% means one unexpected traffic spike — a viral tweet, a downstream retry storm, a cache flush — lands you in an incident.
The formula is simple: headroom = (saturation_rps - current_peak_rps) / saturation_rps × 100. The art is in agreeing what "current peak" means: use the p99 of your hourly peak RPS over the last 30 days, not the average, and not a synthetic estimate.
When reporting headroom to engineering leadership or SRE reviews, anchor the number to a time-to-exhaustion estimate. If traffic is growing at 15% month-over-month and you have 35% headroom today, you have roughly 2–3 months before you need to act. This framing converts an abstract percentage into a concrete deadline that drives prioritization.
k6-traffic-capture) and compare saturation points between environments. A 20% discrepancy is common; a 60% discrepancy means your staging environment is not representative and all budget decisions based on it are suspect.
Writing Performance Reports That Drive Action
A performance report that does not lead to a decision is wasted work. Structure every report around three sections: Findings (what the data shows, with concrete numbers and annotated graphs), Risk Assessment (headroom percentage, time-to-exhaustion at current growth rate, which percentile is closest to SLO breach), and Recommended Actions (ranked by impact-to-effort ratio, with an owner and a deadline). Avoid adjectives — "latency is high" is meaningless; "p99 at 600 RPS is 340 ms, 36% above the 250 ms budget" is actionable.
For recurring reports (weekly CI gates, quarterly capacity reviews), track the saturation RPS and p99 at peak load as a time-series chart. A steady saturation RPS that decreases release-over-release means you are accumulating performance debt. A p99 that is slowly climbing toward the SLO threshold is a pre-incident signal that should trigger a code review before it becomes a production fire.