Asynchronous Processing & Messaging

Apache Kafka & Event Streaming

18 min Lesson 4 of 10

Apache Kafka & Event Streaming

Traditional message queues remove a message once it is consumed. Apache Kafka takes a fundamentally different approach: it is a distributed commit log. Every event is written to an append-only, durable log and stays there for a configurable retention period — days, weeks, or forever. Consumers do not delete messages; they simply advance a pointer (called an offset) through the log. This single design decision unlocks capabilities that ordinary queues cannot match.

The Commit Log

Think of a Kafka topic as a file on disk that only supports appending. Producers write records to the end of this file. Every record has a monotonically increasing offset — an integer that uniquely identifies its position in the log. Consumers read by specifying an offset: "give me everything from offset 1 000 000 onwards."

Because the log is immutable and retained, several powerful patterns become trivial:

Replay — a new downstream service can consume all historical events from offset 0, catching up to the present without any special export job.
Multiple independent consumers — two services can read the same topic at completely different positions without interfering with each other.
Audit trail — the log is an accurate, ordered record of everything that happened, making debugging and compliance straightforward.

Key idea: In a traditional queue, reading a message destroys it. In Kafka, reading a message advances your cursor. The log belongs to Kafka, not to the consumer.

Partitions: The Unit of Parallelism

A single log file quickly becomes a bottleneck at high throughput. Kafka solves this by splitting each topic into partitions — ordered, independent sub-logs stored on different brokers. A topic with 12 partitions can sustain roughly 12× the write throughput of a single-partition topic, because producers write to multiple partitions in parallel.

When a producer sends a message, Kafka decides which partition to route it to:

By key — hash(key) % num_partitions. All events for the same order ID, user ID, or sensor ID land in the same partition, guaranteeing per-key ordering.
Round-robin — if no key is supplied, records are distributed evenly for maximum throughput.
Custom partitioner — your own logic, for example routing VIP customers to dedicated partitions.

Ordering guarantee: Kafka guarantees ordering within a partition, never across partitions. If event order matters (e.g. account debit before credit), all related events must share the same partition key.

Consumer Groups: Scaling Consumption

A consumer group is a set of consumer processes that jointly consume a topic. Kafka assigns each partition to exactly one consumer in the group at any time — this is called partition assignment. The result is that the group collectively processes all partitions in parallel, with no duplication.

The rule of thumb: the maximum parallelism of a consumer group equals the number of partitions. If a topic has 6 partitions and you start 8 consumers in the same group, 2 consumers will sit idle. Scale the partition count first, then scale consumers.

Different consumer groups are completely independent. A topic with 3 consumer groups has 3 separate offset pointers per partition — each group reads at its own pace, and one group falling behind never affects the others. This is how Kafka fans out a single event stream to analytics, search indexing, and audit logging simultaneously.

A single Kafka topic with 3 partitions, one producer, and two independent consumer groups reading at different offsets.

Kafka vs. Traditional Message Queues

To make the trade-offs concrete, compare a typical queue (RabbitMQ, SQS) against Kafka across the dimensions that matter most at scale:

Throughput: Kafka routinely sustains 1–2 million messages per second per broker by batching writes sequentially to disk. A typical RabbitMQ broker saturates around 20 000–50 000 msg/s because it maintains per-message state.
Retention: Queues delete messages after acknowledgement. Kafka retains them for the configured period (e.g. 7 days), enabling replay and late-joining consumers.
Consumer model: Queues push messages to consumers (or consumers pull a single message at a time). Kafka consumers pull in batches of configurable size, keeping the broker stateless relative to consumer position.
Ordering: Most queues offer at-most ordering under load. Kafka guarantees strict ordering within each partition.
Use case fit: Queues excel for task distribution (one consumer does the work). Kafka excels for event streaming (many independent consumers observe the same stream).

When to choose Kafka: Use Kafka when you need high throughput (>50 000 events/s), multiple independent consumers, event replay, or a permanent audit log. Use a traditional queue when the workload is job-queue style (one worker, process-and-delete semantics) and simplicity matters more than scale.

The Offset Commit Loop

Understanding how Kafka tracks consumer progress is essential for building reliable consumers. The flow works like this:

The consumer polls Kafka and receives a batch of records up to max.poll.records (default 500).
The consumer processes all records in the batch.
The consumer commits the highest offset it has successfully processed back to Kafka (stored in the internal __consumer_offsets topic).
If the consumer crashes before committing, it restarts from the last committed offset — reprocessing some records (at-least-once delivery).
If the consumer commits before processing succeeds, a crash means it skips those records (at-most-once delivery).

Most applications choose at-least-once delivery with idempotent processing — which Lesson 6 covers in depth. The key Kafka configuration levers are enable.auto.commit (default true, risky) and the explicit commitSync() / commitAsync() calls you should prefer in production.

The poll-process-commit loop: only commit the offset after processing succeeds to guarantee at-least-once delivery.

Replication and Durability

Every Kafka partition has a configurable replication factor (typically 3). One broker holds the leader replica for a partition; all reads and writes go through the leader. The other brokers hold follower replicas that replicate the log asynchronously. If the leader broker fails, Kafka elects a new leader from the in-sync replica set (ISR) within seconds, with zero data loss as long as min.insync.replicas was satisfied before the failure.

The producer configuration acks=all (also written acks=-1) combined with min.insync.replicas=2 is the standard production setting: a write is only acknowledged once at least 2 replicas have persisted it. At acks=1 (leader only) throughput is higher but you risk losing the last batch of messages if the leader crashes before followers catch up.

Kafka in the wild: LinkedIn (Kafka\'s birthplace) processes over 7 trillion messages per day across its clusters. Netflix uses Kafka to fan out viewing events to dozens of downstream systems — recommendation engine, billing, analytics, and fraud detection — all from a single stream.