Prometheus & Grafana

Exporters & Service Discovery

18 min Lesson 5 of 32

Exporters & Service Discovery

Prometheus does not instrument your systems directly. Instead, it scrapes exporters — small HTTP servers that translate a third-party system's internal metrics into the Prometheus exposition format. Combined with service discovery, Prometheus can find and scrape every exporter in a dynamic fleet without a single static IP being written by hand.

node_exporter: the Foundation of Host Metrics

node_exporter is the standard exporter for Linux/Unix host metrics. It exposes over 1,000 metrics covering CPU, memory, disk I/O, filesystem, network interfaces, NTP drift, and systemd unit state on port 9100 at the /metrics endpoint.

# Deploy node_exporter as a systemd service (v1.8.x, 2025) wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz tar xvf node_exporter-1.8.2.linux-amd64.tar.gz install -m 0755 node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/ cat > /etc/systemd/system/node_exporter.service <<'UNIT' [Unit] Description=Prometheus Node Exporter After=network.target [Service] User=node_exporter ExecStart=/usr/local/bin/node_exporter \ --collector.filesystem.mount-points-exclude='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \ --collector.systemd \ --collector.processes \ --web.listen-address=:9100 Restart=on-failure [Install] WantedBy=multi-user.target UNIT useradd -rs /bin/false node_exporter systemctl daemon-reload && systemctl enable --now node_exporter # Smoke test curl -s http://localhost:9100/metrics | grep '^node_cpu_seconds_total' | head -3
Always exclude /proc, /sys, and Docker overlay mounts from the filesystem collector. Without the exclusion flag, node_exporter generates hundreds of spurious filesystem metrics for container layers — enough to cause cardinality problems in Prometheus.

The Exporter Ecosystem

Beyond node_exporter, the official and community exporter ecosystem covers nearly every piece of infrastructure you will encounter at scale:

  • blackbox_exporter — probes HTTP, HTTPS, DNS, TCP, ICMP from the outside. Essential for synthetic monitoring and SSL expiry alerting.
  • mysqld_exporter / postgres_exporter — per-database metrics: query latency, connection pool saturation, replication lag, slow queries.
  • redis_exporter — keyspace hits/misses, memory fragmentation ratio, connected clients, replication offset.
  • kube-state-metrics — Kubernetes object state (Deployment replicas desired vs. ready, Pod restarts, PVC bound status). Distinct from cAdvisor which measures resource consumption.
  • process-exporter — per-process CPU and memory when you need to track individual daemons without full host granularity.
  • kafka_exporter / rabbitmq_exporter — queue depth, consumer lag, partition leader distribution.
Any application can become its own exporter by exposing a /metrics endpoint using a Prometheus client library (Go, Java, Python, Ruby, Rust). In production, prefer the push-gateway pattern only for short-lived batch jobs; long-lived services should expose /metrics directly.

kubernetes_sd: Service Discovery in Dynamic Clusters

In Kubernetes, pods are ephemeral — IPs change with every rollout. Static scrape_configs cannot keep up. kubernetes_sd_configs solves this by querying the Kubernetes API continuously and surfacing targets grouped by role: node, pod, service, endpoints, or ingress.

Prometheus Kubernetes Service Discovery and Relabeling Flow Kubernetes API Server kubernetes_sd role: endpoints role: pod role: node Relabeling keep / drop replace labels set __address__ Prometheus Scrape Loop pod A:8080 pod B:8080 filtered target set
Prometheus uses kubernetes_sd to discover targets from the Kubernetes API, then relabeling filters and reshapes the target set before scraping.
# prometheus.yml — scrape Pods that have the annotation # prometheus.io/scrape: "true" scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: # Drop pods not opted in - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: "true" # Use custom port annotation if present - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (\d+) replacement: $1 target_label: __address__ # rewrite host:port - source_labels: [__meta_kubernetes_pod_ip, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+);(\d+) replacement: $1:$2 target_label: __address__ # Carry namespace and pod name into every series - source_labels: [__meta_kubernetes_namespace] target_label: namespace - source_labels: [__meta_kubernetes_pod_name] target_label: pod - source_labels: [__meta_kubernetes_pod_label_app] target_label: app

Relabeling: Shaping the Target Set

Relabeling is the most powerful — and most misunderstood — feature in Prometheus configuration. It operates at two stages: relabel_configs (before a scrape, determines whether and how to scrape a target) and metric_relabel_configs (after a scrape, on each time series returned).

Every target carries a set of meta-labels prefixed with __meta_ that are populated by the SD mechanism. These are never stored in the TSDB — they exist only during relabeling to let you build real labels.

  • keep — only keep targets whose source label matches the regex; drop all others.
  • drop — inverse of keep. Useful to suppress noisy exporters per namespace.
  • replace — extract a value (optionally with a capture group) and write it into a target label.
  • labelmap — fan out matching meta-labels into real labels. Common use: promote all __meta_kubernetes_pod_label_* to top-level labels.
  • labeldrop / labelkeep — remove or keep only specific label names from the final series.
# metric_relabel_configs example: drop high-cardinality kubelet metrics # that are scraped but never queried, to reduce TSDB ingestion cost - job_name: 'kubelet' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) metric_relabel_configs: # Drop per-container resource metrics — use kube-state-metrics instead - source_labels: [__name__] regex: 'container_(cpu_cfs_throttled_seconds_total|network_tcp_usage_total)' action: drop # Normalize the id label to avoid label value explosion - source_labels: [id] regex: '/kubepods/.+/pod.+/(.{12}).+' replacement: '$1' target_label: container_id
Never use labelmap to blindly promote all Kubernetes pod labels (__meta_kubernetes_pod_label_.*) into Prometheus labels without auditing them first. If a developer adds a label like git-commit-sha or a UUID-bearing label to pods, every unique value becomes a new series. This is the leading cause of cardinality explosions in Prometheus at scale. Always use an allowlist: regex: __meta_kubernetes_pod_label_(app|version|env|team).

Federation and the Hierarchy Pattern

At fleet scale, a single Prometheus server cannot scrape 10,000 nodes. The standard architecture uses a hierarchical federation: per-datacenter or per-cluster Prometheus servers scrape local exporters; a top-level global Prometheus server federates only pre-aggregated recording rules from them. The /federate endpoint supports this with a match[] parameter.

In Kubernetes environments, the current best practice at big-tech scale is Prometheus Operator + kube-prometheus-stack (via Helm). This ships pre-built ServiceMonitor CRDs, default alerts (via kube-prometheus rules), and a bundled Grafana instance. You define scrape targets declaratively as Kubernetes objects rather than editing prometheus.yml directly — the Operator watches ServiceMonitor/PodMonitor resources and regenerates config automatically.

Production Failure Modes to Know

Understanding what breaks in real environments is as important as knowing what to configure:

  • Scrape timeout vs. scrape interval — if a target takes longer to respond than the scrape_timeout, Prometheus marks it as up=0 and emits no metrics for that cycle. The default timeout (10 s) is often too short for MySQL or JVM exporters under GC pressure. Tune per job.
  • SD cache stalenesskubernetes_sd polls the API every refresh_interval (default 5 minutes). A pod that dies and respawns within that window may be scraped at a stale address. Use endpoints role rather than pod role when you need sub-minute consistency.
  • RBAC missing — Prometheus needs a ClusterRole granting get/list/watch on nodes, pods, services, and endpoints, plus a ClusterRoleBinding to its service account. Missing permissions produce silent 403 errors in logs; the scrape_config silently returns zero targets.
  • node_exporter port blocked — Security groups or NetworkPolicy rules that block port 9100 are a common gotcha after cluster hardening. Symptoms: up{job="node"} == 0.