Prometheus & Grafana

Exporters & Service Discovery

18 min Lesson 5 of 32

Exporters & Service Discovery

Prometheus does not instrument your systems directly. Instead, it scrapes exporters — small HTTP servers that translate a third-party system's internal metrics into the Prometheus exposition format. Combined with service discovery, Prometheus can find and scrape every exporter in a dynamic fleet without a single static IP being written by hand.

node_exporter: the Foundation of Host Metrics

node_exporter is the standard exporter for Linux/Unix host metrics. It exposes over 1,000 metrics covering CPU, memory, disk I/O, filesystem, network interfaces, NTP drift, and systemd unit state on port 9100 at the /metrics endpoint.

# Deploy node_exporter as a systemd service (v1.8.x, 2025)
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xvf node_exporter-1.8.2.linux-amd64.tar.gz
install -m 0755 node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/

cat > /etc/systemd/system/node_exporter.service <<'UNIT'
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter \
  --collector.filesystem.mount-points-exclude='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \
  --collector.systemd \
  --collector.processes \
  --web.listen-address=:9100
Restart=on-failure

[Install]
WantedBy=multi-user.target
UNIT

useradd -rs /bin/false node_exporter
systemctl daemon-reload && systemctl enable --now node_exporter

# Smoke test
curl -s http://localhost:9100/metrics | grep '^node_cpu_seconds_total' | head -3

Always exclude /proc, /sys, and Docker overlay mounts from the filesystem collector. Without the exclusion flag, node_exporter generates hundreds of spurious filesystem metrics for container layers — enough to cause cardinality problems in Prometheus.

The Exporter Ecosystem

Beyond node_exporter, the official and community exporter ecosystem covers nearly every piece of infrastructure you will encounter at scale:

blackbox_exporter — probes HTTP, HTTPS, DNS, TCP, ICMP from the outside. Essential for synthetic monitoring and SSL expiry alerting.
mysqld_exporter / postgres_exporter — per-database metrics: query latency, connection pool saturation, replication lag, slow queries.
redis_exporter — keyspace hits/misses, memory fragmentation ratio, connected clients, replication offset.
kube-state-metrics — Kubernetes object state (Deployment replicas desired vs. ready, Pod restarts, PVC bound status). Distinct from cAdvisor which measures resource consumption.
process-exporter — per-process CPU and memory when you need to track individual daemons without full host granularity.
kafka_exporter / rabbitmq_exporter — queue depth, consumer lag, partition leader distribution.

Any application can become its own exporter by exposing a /metrics endpoint using a Prometheus client library (Go, Java, Python, Ruby, Rust). In production, prefer the push-gateway pattern only for short-lived batch jobs; long-lived services should expose /metrics directly.

kubernetes_sd: Service Discovery in Dynamic Clusters

In Kubernetes, pods are ephemeral — IPs change with every rollout. Static scrape_configs cannot keep up. kubernetes_sd_configs solves this by querying the Kubernetes API continuously and surfacing targets grouped by role: node, pod, service, endpoints, or ingress.

Prometheus uses kubernetes_sd to discover targets from the Kubernetes API, then relabeling filters and reshapes the target set before scraping.

# prometheus.yml — scrape Pods that have the annotation
# prometheus.io/scrape: "true"
scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Drop pods not opted in
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      # Use custom port annotation if present
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (\d+)
        replacement: $1
        target_label: __address__
        # rewrite host:port
      - source_labels: [__meta_kubernetes_pod_ip,
                        __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: (.+);(\d+)
        replacement: $1:$2
        target_label: __address__
      # Carry namespace and pod name into every series
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod
      - source_labels: [__meta_kubernetes_pod_label_app]
        target_label: app

Relabeling: Shaping the Target Set

Relabeling is the most powerful — and most misunderstood — feature in Prometheus configuration. It operates at two stages: relabel_configs (before a scrape, determines whether and how to scrape a target) and metric_relabel_configs (after a scrape, on each time series returned).

Every target carries a set of meta-labels prefixed with __meta_ that are populated by the SD mechanism. These are never stored in the TSDB — they exist only during relabeling to let you build real labels.

keep — only keep targets whose source label matches the regex; drop all others.
drop — inverse of keep. Useful to suppress noisy exporters per namespace.
replace — extract a value (optionally with a capture group) and write it into a target label.
labelmap — fan out matching meta-labels into real labels. Common use: promote all __meta_kubernetes_pod_label_* to top-level labels.
labeldrop / labelkeep — remove or keep only specific label names from the final series.

# metric_relabel_configs example: drop high-cardinality kubelet metrics
# that are scraped but never queried, to reduce TSDB ingestion cost
- job_name: 'kubelet'
  kubernetes_sd_configs:
    - role: node
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
  metric_relabel_configs:
    # Drop per-container resource metrics — use kube-state-metrics instead
    - source_labels: [__name__]
      regex: 'container_(cpu_cfs_throttled_seconds_total|network_tcp_usage_total)'
      action: drop
    # Normalize the id label to avoid label value explosion
    - source_labels: [id]
      regex: '/kubepods/.+/pod.+/(.{12}).+'
      replacement: '$1'
      target_label: container_id

Never use labelmap to blindly promote all Kubernetes pod labels (__meta_kubernetes_pod_label_.*) into Prometheus labels without auditing them first. If a developer adds a label like git-commit-sha or a UUID-bearing label to pods, every unique value becomes a new series. This is the leading cause of cardinality explosions in Prometheus at scale. Always use an allowlist: regex: __meta_kubernetes_pod_label_(app|version|env|team).

Federation and the Hierarchy Pattern

At fleet scale, a single Prometheus server cannot scrape 10,000 nodes. The standard architecture uses a hierarchical federation: per-datacenter or per-cluster Prometheus servers scrape local exporters; a top-level global Prometheus server federates only pre-aggregated recording rules from them. The /federate endpoint supports this with a match[] parameter.

In Kubernetes environments, the current best practice at big-tech scale is Prometheus Operator + kube-prometheus-stack (via Helm). This ships pre-built ServiceMonitor CRDs, default alerts (via kube-prometheus rules), and a bundled Grafana instance. You define scrape targets declaratively as Kubernetes objects rather than editing prometheus.yml directly — the Operator watches ServiceMonitor/PodMonitor resources and regenerates config automatically.

Production Failure Modes to Know

Understanding what breaks in real environments is as important as knowing what to configure:

Scrape timeout vs. scrape interval — if a target takes longer to respond than the scrape_timeout, Prometheus marks it as up=0 and emits no metrics for that cycle. The default timeout (10 s) is often too short for MySQL or JVM exporters under GC pressure. Tune per job.
SD cache staleness — kubernetes_sd polls the API every refresh_interval (default 5 minutes). A pod that dies and respawns within that window may be scraped at a stale address. Use endpoints role rather than pod role when you need sub-minute consistency.
RBAC missing — Prometheus needs a ClusterRole granting get/list/watch on nodes, pods, services, and endpoints, plus a ClusterRoleBinding to its service account. Missing permissions produce silent 403 errors in logs; the scrape_config silently returns zero targets.
node_exporter port blocked — Security groups or NetworkPolicy rules that block port 9100 are a common gotcha after cluster hardening. Symptoms: up{job="node"} == 0.