Streaming Basics with Kafka
Streaming Basics with Kafka
The previous lessons in this tutorial covered resilience patterns — circuit breakers, retries, bulkheads — that protect a service from downstream failures. Those patterns still assumed a synchronous request/response model. Apache Kafka introduces a fundamentally different model: event streaming. Instead of a caller waiting for a reply, a producer appends an immutable event to a durable log, and one or more consumers read from that log independently and at their own pace.
This lesson introduces Kafka's core abstractions and shows how to produce and consume messages from a Spring Boot 3 application using Spring Cloud Stream — the framework-level abstraction that lets you switch message brokers without rewriting business logic.
Why Kafka?
Traditional message queues (RabbitMQ, ActiveMQ) delete a message once it has been successfully consumed. Kafka is different: it is a distributed, replicated commit log. Messages are retained for a configurable period (hours, days, weeks) regardless of whether anyone has read them. This gives you several properties that are difficult to achieve with queues:
- Replay: a new consumer can start from the beginning of the log and reconstruct state.
- Fan-out at scale: thousands of independent consumer groups can read the same topic without any coordination.
- Exactly-once semantics (with careful configuration): transactions and idempotent producers let you guarantee that an event is processed exactly once even under failure.
- Ordering guarantees: messages within a single partition are strictly ordered.
Core Kafka Concepts
Before writing any code, make sure the vocabulary is clear:
- Topic: a named, append-only log. Think of it as a category for your events (e.g.,
order-placed,payment-processed). - Partition: a topic is split into one or more partitions for parallelism. Each partition is an independent ordered log.
- Offset: the sequential position of a message within a partition. Kafka never removes messages by default; consumers track their own offset.
- Producer: writes records to a topic. It chooses which partition to write to (round-robin, hash of key, or custom).
- Consumer group: one or more consumers sharing a group ID. Kafka assigns partitions to group members so each partition is consumed by exactly one member at a time. Multiple groups can read the same topic independently.
- Broker: a Kafka server. A cluster has multiple brokers for replication and load distribution.
Spring Cloud Stream Overview
Spring Cloud Stream (SCS) wraps the Kafka client behind a binder abstraction. Your application code works with functional beans — Supplier, Function, and Consumer from java.util.function. SCS maps these to Kafka topics via configuration, letting you swap the binder (to RabbitMQ, for example) without touching business logic.
Add the Kafka binder to pom.xml:
Spring Cloud BOM (managed via the parent or dependencyManagement) keeps the version aligned with your Spring Boot version. For Spring Boot 3.x use Spring Cloud 2023.x.
Defining a Producer with Supplier
A Supplier<T> bean is polled by the framework on a schedule and its return value is sent as a message. This is useful for polling-based sources (reading from a database, generating heartbeat events). For event-driven production, use StreamBridge instead (shown after the consumer example).
Bind it to a Kafka topic in application.yml:
<functionName>-in-0 (input) and <functionName>-out-0 (output). The 0 is the index for multi-input/output functions. Always double-check the binding name or set it explicitly via spring.cloud.function.definition.
Defining a Consumer with java.util.function.Consumer
A Consumer<T> bean receives messages from its bound input topic:
Setting group is critical in production. Without it, SCS creates an anonymous group on every restart, so your service always starts reading from the latest offset — missed events from any downtime are permanently lost.
Imperative Production with StreamBridge
When you need to send a message in response to an HTTP request or some other trigger (not on a schedule), inject StreamBridge:
Message Keys and Partitioning
Kafka guarantees ordering only within a partition. If you need all events for the same order to be processed in order, they must land on the same partition. You achieve this by setting a message key. SCS exposes this via the KafkaHeaders.MESSAGE_KEY header:
Kafka hashes the key to decide the partition, so all events sharing the same orderId land on the same partition and are processed in the order they arrived.
Error Handling and Dead-Letter Topics
If a consumer throws an exception, SCS retries by default (configurable via maxAttempts). After exhausting retries the message can be sent to a dead-letter topic (DLT) instead of being silently dropped:
Security Considerations
Production Kafka clusters should enforce authentication and encryption. The most common approach is SASL/SCRAM over TLS:
Always load credentials from environment variables or a secrets manager (Vault, AWS Secrets Manager). A plaintext password in application.yml committed to source control is as dangerous as an exposed database password.
Local Development with Docker Compose
Spin up a minimal Kafka cluster for local development with a single docker-compose.yml:
Alternatively, use the newer KRaft mode (Kafka without Zookeeper) available from Confluent's cp-kafka 7.4+ images with a single container.
Summary
Kafka's persistent, replayable log model makes it the foundation of event-driven microservice architectures. Spring Cloud Stream lets you produce and consume Kafka messages through idiomatic Spring beans — Supplier, Consumer, and StreamBridge — with topology wired entirely through configuration. The patterns you need to master immediately are: always set a consumer group, use message keys for ordering guarantees, and configure a dead-letter topic so no event is silently lost. The next lesson covers distributed tracing, which becomes essential once events flow asynchronously across services.