The Prometheus Model
The Prometheus Model
You have spent previous tutorials building observable foundations — distributed tracing with Jaeger, structured logging with Loki, and the three-pillars mental model. Now it is time to go deep on the most widely adopted metrics system in the DevOps world. Prometheus is not merely a metrics database: it is a complete model for how to think about measurement in cloud-native systems. Understanding that model — its pull-based architecture, its time-series storage engine, and the ecosystem that surrounds it — is the prerequisite for everything else in this tutorial.
At Google scale, teams run internal systems (Borgmon, then Monarch) that share Prometheus's core design philosophy. The Prometheus project, started at SoundCloud in 2012 and donated to the CNCF in 2016, made that philosophy available to everyone. Today it is the default metrics backend for Kubernetes clusters from every major cloud provider.
Pull-Based Scraping: The Fundamental Design Choice
Most metrics systems you may have encountered — StatsD, Graphite, many APM agents — are push-based: applications send metrics to a central collector. Prometheus inverts this. It is pull-based: Prometheus itself reaches out over HTTP to each target and scrapes an /metrics endpoint that the target exposes. This is not an implementation detail — it is a deliberate architectural decision with deep operational consequences.
The /metrics endpoint serves the Prometheus exposition format: a plain-text, line-oriented format that any HTTP client can read without special tooling. Each target is responsible for maintaining a current view of its own counters, gauges, histograms, and summaries, and for serving them on demand. Prometheus pulls this snapshot on a configurable interval — the scrape_interval — typically 15 or 30 seconds in production.
up == 0. In a push model, the absence of data is ambiguous — is the service dead, or just not sending? This difference makes alerting on service availability dramatically simpler and more reliable in a pull architecture.Why pull instead of push? Several engineering reasons converge:
- Health as a first-class signal: A failed scrape immediately surfaces as a missing or zero
upmetric. You do not need a separate health-check system. - Configuration lives with Prometheus, not targets: You control scrape frequency, timeouts, and relabeling centrally. Targets need only serve an endpoint — they do not need to know where Prometheus lives.
- Local debugging: You can
curl http://my-service:8080/metricsat any time and see exactly what Prometheus sees. There is no invisible agent, no buffering, no retry queue to reason about. - No metric loss from network partitions toward Prometheus: If Prometheus is temporarily unreachable, targets accumulate state in memory. When scraping resumes, the next scrape reflects the current state. Counter continuity is preserved because Prometheus tracks the last scraped value.
The Exposition Format and the Client Libraries
A /metrics response looks like this — a plain text file where each line is a metric observation:
The Prometheus project maintains official client libraries for Go, Java/JVM, Python, and Ruby. Community-supported libraries exist for virtually every language. Instrumenting a Go HTTP server is a matter of importing prometheus/client_golang and registering metrics — the library handles thread-safe accumulation and HTTP exposition. In Kubernetes environments, the kube-state-metrics exporter exposes cluster state, and node_exporter exposes OS-level metrics from every node — you scrape these exactly like application endpoints.
The Time-Series Database (TSDB)
Prometheus stores all scraped samples in its embedded TSDB — a purpose-built time-series database optimized for write-heavy, append-only workloads with high-cardinality label sets. Understanding its storage model helps you operate it correctly and avoid the most common production failures.
Each time series is identified by a unique combination of a metric name and a set of labels — key-value pairs that provide dimensions. The series http_requests_total{method="GET", status="200", service="orders"} is a completely separate time series from http_requests_total{method="POST", status="201", service="orders"}. Every distinct label combination creates a new series. This is both Prometheus's greatest power and its most common footgun.
The TSDB organizes data into two layers:
- In-memory Head block: The most recent two hours of data live in a compressed, memory-mapped write-ahead log (WAL). Scrapes write here first — extremely fast, sequential I/O. On restart, the WAL replays to rebuild the head.
- Persistent blocks: Every two hours, the head block is compacted and written to a persistent block on disk. Blocks are immutable. A background compactor merges smaller blocks into larger ones (covering up to 31% of the configured retention window) to reduce query overhead across long time ranges.
Default retention is 15 days of local storage. For long-term retention, Prometheus supports a remote_write interface that streams samples to an external backend — Thanos, Cortex, Mimir, or Victoria Metrics — which handles multi-year retention at scale with object storage.
prometheus_tsdb_head_series — alert when it exceeds 80% of your capacity budget.The Prometheus Ecosystem: Architecture Diagram
Prometheus does not operate in isolation. The production architecture involves a set of well-defined components, each with a specific role. Understanding what each component does — and what it does not do — prevents architectural mistakes that are expensive to undo.
Service Discovery: How Prometheus Knows What to Scrape
In static environments you could list scrape targets manually. In Kubernetes, where pods are born and die constantly, static configuration is impossible. Prometheus has built-in service discovery integrations for Kubernetes, Consul, AWS EC2, GCE, Azure, and DNS-SD. In a Kubernetes cluster, the typical setup uses the kubernetes_sd_configs mechanism with relabeling rules to filter and transform the discovered targets.
The external_labels block is critical in multi-cluster setups. When remote-writing to a central store like Thanos, these labels attach to every series so you can distinguish cluster="prod-us-east-1" from cluster="prod-eu-west-1" in global queries.
What Prometheus Is Not
Understanding Prometheus's intentional limitations prevents architectural mistakes. Prometheus is designed for numeric time-series metrics with bounded cardinality. It is explicitly not designed for:
- Log storage: Use Loki, Elasticsearch, or Splunk for logs. Do not encode log content into metric labels.
- Event tracking or billing: Prometheus may lose up to one scrape interval of data (15–30 seconds) on crash. It offers no durability guarantees for individual events. Use Kafka or a transactional store for billing-critical counts.
- Long-term retention out of the box: Default 15-day local storage. For years of history, remote_write to Thanos or Mimir from day one.
- High-cardinality dimensions: No user IDs, request IDs, or session tokens as labels. These belong in traces (Jaeger/Tempo) or logs.
In the next lesson you will go deep on the four metric types — counter, gauge, histogram, and summary — and the exposition format. With the pull model and TSDB architecture firmly in mind, those details will snap into place immediately.