Distributed Tracing
Distributed Tracing
When a single HTTP request fans out across four microservices before returning a response, a slow P99 latency or a sporadic 500 error is almost impossible to diagnose with ordinary logs. Each service writes its own log lines, on its own clock, with its own format. Correlating them manually is error-prone and slow. Distributed tracing solves this by attaching a unique identifier to every request at its entry point and propagating that identifier — automatically — through every downstream call. The result is a complete, causal timeline of exactly where a request spent its time and where it failed.
Core Vocabulary
Before writing any code it helps to have the terminology straight:
- Trace — the entire end-to-end journey of one logical request. Identified by a globally unique
traceId. - Span — a named, timed unit of work within a trace. Every service hop, every database call, every significant operation is modelled as a span. A span knows its
traceId, its ownspanId, and theparentSpanIdof the operation that triggered it. - Context propagation — the mechanism by which
traceIdandspanIdcross service boundaries, typically as HTTP headers such astraceparent(W3C standard) orX-B3-TraceId(Zipkin/Brave legacy). - Exporter — the component that sends completed spans to a tracing backend (Zipkin, Jaeger, or an OTLP-compatible collector such as Grafana Tempo).
Micrometer Tracing in Spring Boot 3
Spring Boot 3 replaced the older Spring Cloud Sleuth library with Micrometer Tracing, which is a thin, vendor-neutral facade over tracing implementations (Brave/Zipkin or OpenTelemetry). Add the following to pom.xml:
micrometer-tracing-bridge-otel if your organisation already uses an OpenTelemetry Collector, since OTLP is rapidly becoming the universal standard. Brave is simpler to get started with and has a smaller dependency footprint.
Minimal Configuration
With those JARs on the classpath, auto-configuration activates tracing. Tune it in application.yml:
The logging pattern injects traceId and spanId into every log line automatically. When a request fails you can copy the traceId from any log line and open Zipkin's UI to see the full waterfall.
Automatic Instrumentation
Most Spring Boot instrumentation is zero-code. Once the dependencies are present:
- Incoming HTTP requests — the
TracingFilter(servlet) orTracingWebFilter(reactive) starts a new trace (or joins an existing one if atraceparentheader is present) and closes the span when the response is sent. - Outgoing HTTP calls with
RestTemplateorWebClient— the tracing auto-configuration adds an interceptor/exchange filter that injects propagation headers into every outbound request, so the downstream service automatically participates in the same trace. - Spring Data / JDBC — when
spring-boot-starter-data-jpais on the classpath, database calls appear as child spans named after the query. - Message listeners (Kafka, RabbitMQ) — headers in the message record carry the trace context, and the listener instrumentation picks them up.
RestTemplate or WebClient beans through the auto-configured builder. Creating a plain new RestTemplate() bypasses the tracing interceptor. Instead inject RestTemplateBuilder (synchronous) or use the auto-wired WebClient.Builder (reactive) — both are pre-configured with the tracing filter.
Creating Custom Spans
Auto-instrumentation covers the infrastructure layer. For business-logic operations that are expensive or failure-prone — a third-party API call, a complex calculation, a cache lookup — you want a dedicated span so the waterfall shows exactly how long it took. Inject Tracer and use the fluent API:
A few things to notice in this pattern:
tracer.nextSpan()creates a child of the current active span, so it slots correctly into the existing trace hierarchy.span.tag()attaches key-value metadata that appears in the Zipkin/Jaeger span detail view — invaluable for filtering traces by product, user, tenant, or any business dimension.span.error(ex)records the exception and sets the span status to ERROR, surfacing it immediately in the tracing UI.- The
finallyblock is mandatory; an unclosed span leaks memory and never reaches the exporter.
Security Considerations
traceparent header, your services will join that trace — potentially leaking internal service topology to an attacker who can correlate timing data. Mitigate this by trusting incoming trace context only from authenticated internal callers (e.g., services that present a valid mTLS certificate or an internal service-to-service JWT). At the perimeter (API Gateway / edge service), strip and re-issue trace headers for requests arriving from the public internet.
Additionally, be mindful of what you attach as span tags. A tag like user.id or request.body will be stored verbatim in the tracing backend. Treat the tracing system as an observability store, not a logging store, and avoid attaching PII or secrets as span attributes.
Sampling Strategy
Tracing 100 % of requests is fine in development. In production at meaningful scale, exporting every span to a backend creates non-trivial overhead and storage cost. Common strategies:
- Probabilistic (head-based) — sample a fixed percentage (e.g. 10 %) decided at the trace root. Simple, predictable cost. Set with
management.tracing.sampling.probability=0.1. - Rate-limited — sample at most N traces per second regardless of load. Protects the backend during traffic spikes.
- Tail-based — buffer all spans and decide to keep only traces that contain an error or exceed a latency threshold. Requires a collector that supports tail sampling (e.g. OpenTelemetry Collector with the
tail_samplingprocessor). More operationally complex but captures 100 % of interesting traces without the overhead of 100 % export.
Running Zipkin Locally
You can spin up a Zipkin instance in seconds with Docker:
Point your service at http://localhost:9411, make a few HTTP calls, then open http://localhost:9411 in a browser. Click Run Query to see all traces, then click any trace to view its waterfall. Every span is labelled with its service name, operation name, duration, and any tags you added.
Summary
Distributed tracing turns a sea of disconnected log lines into a structured, visual timeline of every request. With Micrometer Tracing and three Maven dependencies, Spring Boot 3 instruments all HTTP server/client traffic, database calls, and messaging listeners automatically. Add custom spans for business-critical operations using the Tracer API, attach meaningful tags, always close spans in a finally block, and choose a sampling strategy that balances visibility against overhead. In the next lesson you will see how to complement traces with metrics and health dashboards to complete the observability picture.