Distributed Tracing & OpenTelemetry

Spans, Traces & Context Propagation

18 min Lesson 2 of 28

Spans, Traces & Context Propagation

A distributed trace is not a single monolithic record — it is a directed acyclic graph of spans, stitched together by identifiers that travel with every request across every service boundary. To use distributed tracing effectively in production, you need to understand the data model precisely: what a span contains, how spans form a tree, and how trace context is propagated across network hops so that spans created in completely different processes can be assembled into a coherent picture.

Anatomy of a Span

A span is the atomic unit of distributed tracing. It represents a single unit of work: an inbound HTTP handler, an outbound gRPC call, a database query, a cache lookup, a background job step. Every span carries a fixed set of fields defined by the OpenTelemetry specification:

Trace ID — a 128-bit (16-byte) globally unique identifier for the entire request journey. All spans belonging to the same request share this ID. Typically encoded as a 32-character lowercase hex string: 4bf92f3577b34da6a3ce929d0e0e4736.
Span ID — a 64-bit (8-byte) unique identifier for this specific span within its trace. Encoded as 16 hex characters: 00f067aa0ba902b7.
Parent Span ID — the span ID of the immediate parent. The root span (entry point) has no parent (or an all-zero parent ID). This field is what creates the parent-child tree.
Operation Name — a human-readable name describing the work: HTTP GET /api/orders, db.query SELECT orders, redis.get order:8821.
Start Time — high-resolution timestamp (nanoseconds since Unix epoch).
Duration — elapsed wall-clock time from start to end in nanoseconds.
Status — one of UNSET, OK, or ERROR. Setting ERROR on a span is what makes it surfaceable in backend UIs and tail-based sampling policies.
Attributes (formerly "tags") — key-value pairs of structured metadata. OTel defines semantic conventions for common attributes: http.method, http.status_code, db.system, db.statement, net.peer.name. Add your own: order.id, user.tier, feature.flag.
Events (formerly "logs") — timestamped annotations within a span's duration: exception stack traces, cache misses, retry attempts. Not separate records — they live inside the span.
Links — references to spans in other traces, used for message queues and async workflows where a consumer span is causally related to a producer span but not a direct child.
Kind — role classification: SERVER (handles an inbound call), CLIENT (makes an outbound call), PRODUCER/CONSUMER (message queue), INTERNAL (in-process work).

Attributes are your primary debugging lever. A trace tells you where time was spent. Attributes tell you why. At Google and Uber, span attributes include business context (user tier, experiment bucket, cart size) so engineers can immediately correlate latency spikes with specific traffic segments — without having to join across logs. Define your semantic conventions early and enforce them in a shared instrumentation library.

Parent-Child Relationships and the Trace Tree

Spans form a tree rooted at a single entry-point span. Every span except the root has exactly one parent. This structure gives you the waterfall view you see in Jaeger and Tempo: a visual timeline showing which spans ran sequentially and which ran in parallel, and exactly how much of the total request latency each span contributed.

Consider a checkout request flowing through four services. The API gateway creates the root span. It calls two downstream services concurrently — order-service and inventory-service — each creating a child span. Order-service then calls the payments database, creating a grandchild span. The resulting tree has four nodes, and the total request duration is determined by the critical path: the longest chain of sequential spans from root to leaf.

Left: the span tree with parent-child relationships. Right: the waterfall timeline — parallel spans overlap; the critical path (postgres query) determines total latency.

W3C Trace Context: The traceparent Header

For traces to work across service boundaries, the trace context must travel with the request. If Service A creates a root span and makes an HTTP call to Service B, Service B must receive the trace ID and the parent span ID so that the span it creates is correctly linked to Service A's span in the same trace. Without this propagation, you get disconnected islands of spans — useless for root cause analysis.

The W3C Trace Context standard (RFC published 2021, now universally supported by OTel, Jaeger, Zipkin, Datadog, and all major APM vendors) defines two HTTP headers for this purpose:

traceparent — carries the core context: version, trace ID, parent span ID, and trace flags.
tracestate — optional vendor-specific key-value pairs (Datadog sampling priority, B3 flags, etc.) that travel alongside without conflicting with the standard.

The traceparent header has a precisely defined format: version-traceId-parentSpanId-flags. In practice it looks like this:

# W3C traceparent header format:
# <version>-<trace-id>-<parent-span-id>-<trace-flags>
#
#   version       = "00" (current W3C spec version)
#   trace-id      = 32 hex chars (128-bit) — shared by ALL spans in this trace
#   parent-span-id= 16 hex chars (64-bit)  — ID of the CALLING span (becomes the new span's parent)
#   trace-flags   = 2 hex chars bitfield   — bit 0: sampling flag (01 = sampled, 00 = not sampled)

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^
          version         trace-id              parent-span-id  flags

# tracestate — vendor extensions alongside the standard header
tracestate: vendor1=abc123,dd=s:1;t.dm:-0

# Example: API Gateway (root span) calls order-service via HTTP.
# API Gateway sets these headers on the outbound request:
GET /internal/orders HTTP/1.1
Host: order-service.svc.cluster.local
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: rojo=00f067aa0ba902b7

# order-service receives the request, extracts the header:
#   - trace_id      = 4bf92f3577b34da6a3ce929d0e0e4736   (reuse — same trace)
#   - parent_span_id= 00f067aa0ba902b7                  (API Gateway span is the parent)
# Creates a new SERVER span with a fresh span_id (e.g. 89ad12c3b45e6f70)
# and then sets traceparent on every outbound call IT makes, using its new span_id as parent.

The sampling flag is advisory, not enforcement. The 01 flag in traceparent signals "I sampled this trace — downstream services, please also sample and report spans." But a downstream service is free to ignore it (e.g. if it is overloaded). In practice, production systems respect the flag to ensure all spans for a sampled trace are collected. When the flag is 00, downstream services typically do not report spans, keeping overhead near zero for unsampled traffic. The OTel SDK handles all of this automatically when you use the W3CPropagator.

Context Propagation in Practice

Context propagation is the mechanism by which trace context is injected into outgoing requests and extracted from incoming ones. The OTel SDK provides a propagator API that handles injection and extraction for different transport formats. The W3C Trace Context propagator is the default for HTTP. For message queues (Kafka, RabbitMQ, SQS), the same IDs are placed into message headers or attributes.

W3C traceparent propagation: the same trace_id flows unchanged across every hop; each service creates a new span_id and sets its caller as the parent.

# Python (FastAPI) — explicit context propagation with OTel SDK
# pip install opentelemetry-sdk opentelemetry-instrumentation-fastapi opentelemetry-instrumentation-httpx

from opentelemetry import trace
from opentelemetry.propagate import inject, extract
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
import httpx

tracer = trace.get_tracer("order-service", "1.0.0")

# Inbound: OTel FastAPI instrumentation auto-extracts traceparent from request headers
# and creates a SERVER span as the parent for all work in this request.

# Outbound: inject trace context into downstream HTTP calls
async def call_inventory_service(order_id: str, parent_ctx):
    with tracer.start_as_current_span(
        "inventory.check_stock",
        kind=trace.SpanKind.CLIENT,
        attributes={
            "http.method": "GET",
            "http.url": f"http://inventory-svc/stock/{order_id}",
            "order.id": order_id,
        },
    ) as span:
        headers = {}
        inject(headers)  # OTel writes traceparent + tracestate into this dict
        # headers now: {"traceparent": "00-4bf92f...-89ad12...-01"}

        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"http://inventory-svc/stock/{order_id}",
                headers=headers,   # context propagated to inventory-service
            )
            if response.status_code != 200:
                span.set_status(trace.StatusCode.ERROR, "inventory check failed")
                span.set_attribute("http.status_code", response.status_code)
            return response.json()

# Async/message queue: propagate into Kafka message headers
def publish_order_event(producer, topic: str, payload: dict):
    with tracer.start_as_current_span("kafka.produce", kind=trace.SpanKind.PRODUCER) as span:
        headers = {}
        inject(headers)  # same inject() call — works for any carrier dict
        producer.produce(
            topic,
            value=json.dumps(payload).encode(),
            headers=list(headers.items()),  # traceparent travels as Kafka message header
        )
        span.set_attribute("messaging.system", "kafka")
        span.set_attribute("messaging.destination", topic)

Span Events and Attributes: Production Patterns

Two span features that are consistently underused but critical in production: span events and carefully chosen attributes.

A span event is a timestamped annotation attached to a span. Rather than emitting a separate log line for "cache miss, falling back to database," record it as an event on the active span. This keeps the data collocated — when you are looking at a slow span in the trace UI, you see exactly what happened and when, without having to pivot to the log store and correlate by timestamp.

Attributes should capture the business context that turns "this database query was slow" into "this database query was slow for Premium users in Germany requesting more than 50 items." Add at most 20-30 attributes per span — each attribute is indexed in the backend and has a storage cost. Avoid high-cardinality values (raw SQL query bodies, full HTTP response bodies) as span attributes; truncate or omit them if needed.

Never put PII or secrets in span attributes or events. Traces are sent to a backend (Jaeger, Tempo, Datadog, Honeycomb) and stored for days or weeks, often with broad internal access. Span attributes are frequently exported to third-party SaaS backends. Scrub user email addresses, phone numbers, payment card numbers, passwords, and auth tokens before they appear in any span. Use a data sanitization layer in your shared instrumentation library, enforced at the SDK level, not left to individual developers.

# Adding span events and attributes — Go (OpenTelemetry SDK)
# go get go.opentelemetry.io/otel

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
    "go.opentelemetry.io/otel/trace"
)

tracer := otel.Tracer("checkout-service")

func processOrder(ctx context.Context, orderID string, userTier string) error {
    ctx, span := tracer.Start(ctx, "checkout.process_order",
        trace.WithSpanKind(trace.SpanKindServer),
        trace.WithAttributes(
            attribute.String("order.id", orderID),
            attribute.String("user.tier", userTier),      // business context
            semconv.ServiceNameKey.String("checkout"),
        ),
    )
    defer span.End()

    // Check cache
    item, found := cache.Get(orderID)
    if !found {
        // Span event: timestamped annotation inside this span
        span.AddEvent("cache.miss", trace.WithAttributes(
            attribute.String("cache.key", "order:"+orderID),
            attribute.String("cache.store", "redis"),
        ))
        item = db.FetchOrder(ctx, orderID)  // child span auto-created by DB instrumentation
    }

    if err := payments.Charge(ctx, item); err != nil {
        // Mark span as ERROR — surfaces in Jaeger/Tempo error filters
        span.RecordError(err)
        span.SetStatus(codes.Error, "payment charge failed")
        span.SetAttribute(attribute.String("error.type", "payment_declined"))
        return err
    }

    span.SetAttributes(
        attribute.Int64("order.item_count", int64(len(item.SKUs))),
        attribute.Float64("order.amount_usd", item.TotalUSD),
    )
    return nil
}

Propagate context through async boundaries explicitly. Go routines, thread pools, and async tasks sever the implicit context chain. Always pass the context.Context (Go), Context (Java), or contextvars.Context (Python) explicitly through async boundaries. If you start a goroutine or a thread pool task, capture the current span context before the async boundary and restore it inside. The OTel SDK cannot do this automatically — it is one of the most common causes of "broken traces" in production where parent-child links are missing.

What Breaks Traces in Production

Understanding the failure modes is as important as understanding the happy path. Common causes of broken or incomplete traces:

Missing propagation at a single hop: One service — often a legacy system, a load balancer, or an API gateway — strips or ignores traceparent. All downstream spans still have a trace ID but their parent link points to a span that the backend never received, creating a disconnected subtree. Fix: audit every service boundary and every HTTP proxy configuration.
Async context loss: A span is started in thread A, work is queued to thread B, and thread B creates child spans — but without the context being passed across the async boundary, thread B creates a new root span instead of a child. The trace splits into two unrelated trees.
Clock skew: Span timestamps come from the host where the SDK runs. If hosts have clock drift (NTP not configured), spans appear to start before their parent ends — a physically impossible state. Production fix: run chrony or ntpd on all nodes; the OTel Collector can apply a clock skew correction heuristic.
Sampling mismatch: Head-based sampling with different rates per service means Service A samples 10% and Service B samples 5%. A trace sampled at A may not be sampled at B, creating an incomplete trace. Use tail-based sampling at the Collector layer to make the keep/drop decision once, centrally, for the entire trace.
Span batch dropped under load: The OTel SDK batches spans in memory before exporting. Under a traffic spike, if the batch queue fills faster than the exporter can drain it, spans are dropped. Monitor otelcol_exporter_send_failed_spans_total in the Collector and size your batch processor buffer (queue_size) for your peak load. A dead Collector or network partition silently drops all spans — build alerting on export failure metrics.

In the next lesson we move to the OpenTelemetry standard itself — its component model (SDK, API, Collector, semantic conventions), how it achieved vendor neutrality, and how to evaluate it against proprietary agents like the Datadog tracer or Dynatrace OneAgent for a greenfield service or a migration.