Prometheus & Grafana

PromQL Fundamentals

18 min Lesson 3 of 32

PromQL Fundamentals

PromQL (Prometheus Query Language) is the engine that turns raw time-series data into actionable intelligence. Unlike SQL, which addresses rows, PromQL addresses streams of float64 samples indexed by labels. Master it and you can answer any observability question your system raises — latency distributions, error budgets, saturation trends, and multi-cluster roll-ups — without leaving a single query language.

The Data Model in One Paragraph

Every metric in Prometheus is identified by its metric name and a set of key-value label pairs. The combination of name plus labels uniquely identifies a time series. At query time, Prometheus holds an in-memory block of recent data and can query compacted blocks on disk. PromQL expressions always produce one of four result types: instant vector (one sample per series at a moment), range vector (a window of samples per series), scalar (a plain number), or string. Understanding which type you hold determines which operators you can apply.

Selectors: Pinning the Series You Want

An instant vector selector names a metric and optionally filters by labels using curly-brace matchers. Four operators are available:

= — exact match
!= — negative exact match
=~ — regex match (RE2 syntax, anchored at both ends automatically)
!~ — negative regex match

# All HTTP request counters for the payments service, production namespace
http_requests_total{namespace="production", service="payments"}

# All 5xx responses across every service (regex)
http_requests_total{status=~"5.."}

# Every job except the legacy scraper
up{job!="legacy-scraper"}

# Range vector: 5-minute window, for rate() later
http_requests_total{job="api-gateway"}[5m]

A metric name is syntactic sugar for the label __name__="http_requests_total". You can query {__name__=~"http_.*", job="api"} to match multiple metrics at once — useful for federation and debugging.

Label matchers compose with AND semantics: every matcher in the braces must match. There is no OR across different label keys at the selector level — use or at the binary-operator level for that.

The rate() and increase() Functions

Raw counter values only go up; the per-second rate of change is what you actually want. rate() and increase() both require a range vector selector. rate() returns the per-second average over the window; increase() returns the absolute increment extrapolated to exactly the window duration. They are related: increase(m[5m]) == rate(m[5m]) * 300.

# Request rate per second over the last 5 minutes
rate(http_requests_total{job="api-gateway"}[5m])

# Total requests received in the last 1 hour (for SLO burn budgets)
increase(http_requests_total{job="api-gateway"}[1h])

# Error ratio — errors per second divided by total per second
rate(http_requests_total{status=~"5..", job="api-gateway"}[5m])
  /
rate(http_requests_total{job="api-gateway"}[5m])

The range window must be at least 4× your scrape interval. With a 15-second scrape interval, use [1m] minimum. A window shorter than two scrape intervals can return no data because Prometheus needs at least two samples to calculate a rate. Many teams standardize on [5m] for operational dashboards and [1h] or [6h] for SLO burn-rate alerts.

Use irate() — instantaneous rate based only on the last two samples — only when you need to catch very sharp spikes (e.g., traffic bursts in a 15-second window). irate() is noisy and not suitable for alerting; prefer rate() for all production alert rules.

Aggregation Operators

Aggregation collapses multiple series into fewer series by applying a mathematical reduction. All aggregations accept an optional by or without clause that controls which labels survive the reduction.

Three "api" pod series collapse into one via sum by (job); the "worker" series passes through independently.

The most important aggregation operators:

sum — total across selected labels (traffic, bytes, counts)
avg — mean value (average latency, average CPU across pods)
max / min — worst/best instance (useful for saturation)
count — number of series matching (fleet size, instance count)
topk(n, expr) / bottomk(n, expr) — top/bottom N series
quantile(φ, expr) — approximate φ-quantile across series (not the same as histogram quantiles)

# Total request rate across all pods, grouped by service
sum by (service) (
  rate(http_requests_total[5m])
)

# 99th-percentile latency from a histogram, per service
histogram_quantile(
  0.99,
  sum by (service, le) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

# Top 5 services by error rate
topk(5,
  sum by (service) (rate(http_requests_total{status=~"5.."}[5m]))
  /
  sum by (service) (rate(http_requests_total[5m]))
)

# Count of running pods per namespace (up == 1 means scrape succeeded)
count by (namespace) (up == 1)

Always use sum by (...) rather than bare sum when aggregating histograms before histogram_quantile(). You must preserve the le label or the quantile calculation silently returns wrong numbers. This is one of the most common PromQL mistakes in production.

Binary Operations and Matching

PromQL supports arithmetic (+ - * /), comparison (== != > < >= <=), and logical (and or unless) operators between two instant vectors. When both sides have the same label set the match is automatic; when they differ you use on(...) or ignoring(...) to align them, and group_left / group_right for one-to-many joins.

# CPU usage ratio — two metrics share {instance, job}
1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))

# Attach a "region" label from a target-info metric (many-to-one join)
rate(http_requests_total[5m])
  * on(instance) group_left(region)
target_info

Comparison operators return the left-hand side value when the condition is true, and drop the series when it is false. This is how alert expressions work: rate(errors[5m]) > 0.01 returns only the series that are currently burning errors above 1 %.

Offset and Subqueries

The offset modifier shifts the evaluation window backward in time. rate(metric[5m] offset 1h) gives you the rate from an hour ago — invaluable for week-over-week comparisons and SLO error budget calculations. Subqueries (expr[range:step]) evaluate a non-range expression over a range, enabling things like max_over_time(rate(metric[5m])[1h:1m]) — the maximum rate seen over the past hour, sampled every minute.

Production Query Discipline

High-cardinality selectors (__name__=~".*" with no other filters) can OOM a Prometheus server. At big-tech scale, teams enforce these rules: always include at least one equality matcher on a low-cardinality label (like job), set --query.max-samples to cap evaluation cost, and route expensive historical queries to Thanos Querier or Cortex rather than the local Prometheus. Recording rules pre-compute costly aggregations so dashboards stay fast even at millions of series.