Resilience, Messaging & Observability

Project: A Resilient, Observable Service

18 min Lesson 10 of 12

Project: A Resilient, Observable Service

This capstone lesson pulls together every pattern from the tutorial into a single, production-ready Spring Boot 3 service. You will build an Order Processing Service that calls a downstream Inventory Service over HTTP, publishes domain events to Kafka, exposes Micrometer metrics, integrates Resilience4j circuit-breakers and retries, and emits distributed traces that Zipkin can collect. By the end you will have a service you can run, break, and watch recover — the essential skill for operating microservices at scale.

Project Architecture

The service has three concerns wired together:

  • Inbound REST API — accepts POST /orders requests from clients.
  • Resilient downstream call — queries inventory-service with a circuit breaker, retry, and timeout.
  • Event publishing — emits an OrderPlaced event to a Kafka topic after a successful reservation.

Observability is not bolted on at the end — it is built in from line one via Micrometer, Spring Boot Actuator, and Micrometer Tracing.

Dependencies (pom.xml)

<dependencies> <!-- Web --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <!-- Resilience4j via Spring Cloud --> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-circuitbreaker-resilience4j</artifactId> </dependency> <!-- Kafka --> <dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> </dependency> <!-- Actuator + Micrometer Prometheus --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency> <!-- Distributed tracing (Brave / Zipkin) --> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-tracing-bridge-brave</artifactId> </dependency> <dependency> <groupId>io.zipkin.reporter2</groupId> <artifactId>zipkin-reporter-brave</artifactId> </dependency> </dependencies>

application.yml — Central Configuration

Keeping resilience policy, Kafka bootstrap, and observability sampling in one file makes the service's behaviour self-documenting.

spring: application: name: order-service kafka: bootstrap-servers: localhost:9092 producer: key-serializer: org.apache.kafka.common.serialization.StringSerializer value-serializer: org.springframework.kafka.support.serializer.JsonSerializer management: endpoints: web: exposure: include: health,info,metrics,prometheus,circuitbreakers tracing: sampling: probability: 1.0 # 100 % in dev; set 0.1 in prod resilience4j: circuitbreaker: instances: inventory: slidingWindowSize: 10 failureRateThreshold: 50 waitDurationInOpenState: 10s permittedNumberOfCallsInHalfOpenState: 3 retry: instances: inventory: maxAttempts: 3 waitDuration: 500ms retryExceptions: - java.io.IOException - org.springframework.web.client.ResourceAccessException timelimiter: instances: inventory: timeoutDuration: 2s inventory: base-url: http://localhost:8081

Domain Model

public record OrderRequest(String productId, int quantity) {} public record OrderResult(String orderId, String status, String message) {} public record InventoryResponse(String productId, boolean available, int stock) {} public record OrderPlacedEvent(String orderId, String productId, int quantity, long timestamp) {}

InventoryClient — Resilient HTTP Call

The client wraps RestTemplate with Resilience4j annotations. The @CircuitBreaker and @Retry annotations are applied at the method level; the @TimeLimiter is wired in configuration. The fallback method signature must mirror the protected method plus a Throwable parameter.

@Component public class InventoryClient { private final RestTemplate restTemplate; @Value("${inventory.base-url}") private String baseUrl; public InventoryClient(RestTemplateBuilder builder) { this.restTemplate = builder.build(); } @CircuitBreaker(name = "inventory", fallbackMethod = "inventoryFallback") @Retry(name = "inventory") public InventoryResponse checkStock(String productId) { String url = baseUrl + "/inventory/" + productId; return restTemplate.getForObject(url, InventoryResponse.class); } // Called when the circuit is open OR all retries are exhausted public InventoryResponse inventoryFallback(String productId, Throwable ex) { // Degrade gracefully — assume unavailable rather than crashing return new InventoryResponse(productId, false, 0); } }
Fallback must not do I/O. If your fallback itself calls a database or another service, you risk cascading failures. Return a cached value, a safe default, or a circuit-open error response — never another network hop.

OrderService — Business Logic with Metrics

@Service public class OrderService { private final InventoryClient inventoryClient; private final KafkaTemplate<String, OrderPlacedEvent> kafkaTemplate; private final MeterRegistry meterRegistry; private final Counter ordersPlaced; private final Counter ordersRejected; public OrderService(InventoryClient inventoryClient, KafkaTemplate<String, OrderPlacedEvent> kafkaTemplate, MeterRegistry meterRegistry) { this.inventoryClient = inventoryClient; this.kafkaTemplate = kafkaTemplate; this.meterRegistry = meterRegistry; this.ordersPlaced = Counter.builder("orders.placed") .description("Successfully placed orders") .register(meterRegistry); this.ordersRejected = Counter.builder("orders.rejected") .description("Orders rejected due to stock") .register(meterRegistry); } @Observed(name = "order.place", contextualName = "placing-order") public OrderResult placeOrder(OrderRequest request) { InventoryResponse stock = inventoryClient.checkStock(request.productId()); if (!stock.available() || stock.stock() < request.quantity()) { ordersRejected.increment(); return new OrderResult(null, "REJECTED", "Insufficient stock"); } String orderId = UUID.randomUUID().toString(); OrderPlacedEvent event = new OrderPlacedEvent( orderId, request.productId(), request.quantity(), System.currentTimeMillis()); kafkaTemplate.send("orders.placed", orderId, event); ordersPlaced.increment(); // Record a distribution summary for order size meterRegistry.summary("order.quantity").record(request.quantity()); return new OrderResult(orderId, "PLACED", "Order accepted"); } }
@Observed creates a span automatically. Micrometer Tracing picks up the @Observed annotation (via an AOP aspect) and wraps the method in a trace span. You get distributed traces without writing Tracer boilerplate in every method.

OrderController

@RestController @RequestMapping("/orders") public class OrderController { private final OrderService orderService; public OrderController(OrderService orderService) { this.orderService = orderService; } @PostMapping public ResponseEntity<OrderResult> place(@RequestBody @Valid OrderRequest request) { OrderResult result = orderService.placeOrder(request); HttpStatus status = "PLACED".equals(result.status()) ? HttpStatus.CREATED : HttpStatus.UNPROCESSABLE_ENTITY; return ResponseEntity.status(status).body(result); } }

Observing the Circuit Breaker in Action

Spring Boot Actuator exposes a dedicated endpoint. While the downstream service is down, send several requests and watch the circuit open:

# Check circuit breaker state via Actuator curl http://localhost:8080/actuator/circuitbreakers # Sample response (OPEN state after threshold exceeded): # { # "circuitBreakers": { # "inventory": { # "failureRate": "60.0%", # "state": "OPEN", # "bufferedCalls": 10, # "failedCalls": 6 # } # } # }

Prometheus Metrics Endpoint

Scrape /actuator/prometheus to see all Micrometer meters, including the custom counters and the Resilience4j integration metrics:

# Custom application counters orders_placed_total 42.0 orders_rejected_total 7.0 # Resilience4j circuit-breaker metrics (auto-registered) resilience4j_circuitbreaker_calls_seconds_count{kind="successful",name="inventory"} 38.0 resilience4j_circuitbreaker_calls_seconds_count{kind="failed",name="inventory"} 4.0 resilience4j_circuitbreaker_state{name="inventory",state="closed"} 1.0 # Order quantity distribution order_quantity_count 42.0 order_quantity_sum 187.0 order_quantity_max 15.0

Distributed Trace Flow

Every inbound HTTP request automatically gets a traceId injected by Micrometer Tracing. The trace propagates into the Kafka record header (via the Brave Kafka instrumentation) and into the downstream HTTP call (via the instrumented RestTemplate). In Zipkin you see:

  • Span 1: POST /orders — the root span.
  • Span 2: placing-order — the @Observed span inside OrderService.
  • Span 3: GET inventory/{productId} — the outbound call, including Resilience4j retry attempts as child spans.
  • Span 4: orders.placed send — the Kafka publish span.

Testing Resilience

The cleanest way to test the circuit breaker is with @SpringBootTest plus a WireMock stub that returns 500 errors:

@SpringBootTest(webEnvironment = RANDOM_PORT) @AutoConfigureWireMock(port = 8081) class OrderServiceResilienceTest { @Autowired TestRestTemplate client; @Autowired CircuitBreakerRegistry cbRegistry; @Test void circuitOpensAfterRepeatedFailures() { // Stub inventory to always fail stubFor(get(urlPathMatching("/inventory/.*")) .willReturn(serverError())); // Drive 10 calls through the service (matches slidingWindowSize) for (int i = 0; i < 10; i++) { client.postForEntity("/orders", new OrderRequest("PROD-1", 1), OrderResult.class); } CircuitBreaker cb = cbRegistry.circuitBreaker("inventory"); assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.OPEN); } }
Test failure modes, not just happy paths. A service that passes all green-path tests but has never been tested under circuit-open conditions, timeout expiry, or Kafka broker unavailability is not production-ready. Chaos-test your service in CI, not in prod.

Production Checklist

  • Set management.tracing.sampling.probability to 0.05–0.10 in production — 100 % sampling creates significant overhead and storage cost.
  • Never expose /actuator/* to the public internet. Place it on an internal port or protect it with Spring Security.
  • Alert on resilience4j_circuitbreaker_state transitions — an open circuit is a production incident, not just a metric blip.
  • Use DeadLetterPublishingRecoverer for Kafka send failures so no event is silently dropped.
  • Tag all custom metrics with meaningful labels (region, env) so dashboards can slice by deployment.

Summary

You have assembled a service that tolerates downstream failures gracefully via circuit breakers and retries, decouples side effects via Kafka events, and surfaces its internal state to operations teams via Prometheus metrics, Actuator endpoints, and Zipkin traces. Each pattern from the earlier lessons has a concrete, testable role here. This is what production-grade microservice code looks like: not individually clever, but collectively robust and transparent.