Delivery Guarantees: At-Most-Once, At-Least-Once & Exactly-Once
Delivery Guarantees: At-Most-Once, At-Least-Once & Exactly-Once
Every distributed messaging system must answer one fundamental question: when a producer sends a message, how many times will the consumer receive and process it? The answer is almost never "exactly once" by default — it depends on a spectrum of guarantees, each with concrete trade-offs in latency, throughput, complexity, and correctness. Getting this wrong can mean lost financial transactions, duplicate order confirmations, or corrupted analytics.
There are three standard delivery semantics. Understanding them — and knowing which one you actually need — is one of the most important decisions in async system design.
At-Most-Once Delivery
The broker fires and forgets. A message is delivered zero or one time — it will never be delivered twice, but it might be lost entirely. The producer sends the message without waiting for acknowledgment; the broker discards the message immediately after a single delivery attempt regardless of whether the consumer succeeded.
Where it appears: UDP-based telemetry, fire-and-forget log shipping, real-time metrics (where losing a single data point is acceptable), IoT sensor readings where stale data is worthless anyway.
Trade-offs: Lowest latency, highest throughput, zero overhead for deduplication. But data loss is a real possibility on any network blip or consumer crash. Use this only when losing messages is cheaper than the cost of reliability.
Real-world example: A game analytics pipeline emitting 500,000 events/second for player telemetry. Losing 0.1% of those events in a network spike has negligible impact on aggregate reports. The overhead of acknowledgment and retry would halve throughput for no practical gain.
At-Least-Once Delivery
The broker retries until it receives an acknowledgment. A message is delivered one or more times — no data is lost, but duplicates are possible. The consumer must explicitly acknowledge (ACK) each message. If the broker does not receive an ACK within a timeout, it re-enqueues the message and delivers it again.
Where it appears: RabbitMQ with ack enabled, Kafka with acks=all and consumer offset not committed until processing is complete, SQS standard queues, most enterprise job queues.
Trade-offs: Strong durability guarantee — messages survive broker and consumer crashes. But duplicates are inevitable: network timeouts can cause re-delivery even after the consumer successfully processed the message (it crashed before it could ACK). Your consumer logic must be idempotent (covered in Lesson 6).
Real-world example: An order-processing service. If the consumer crashes after charging the card but before ACKing the message, the broker re-delivers. The consumer must detect the duplicate (e.g., by checking if an order with that ID already exists) rather than charging the card a second time.
Exactly-Once Delivery
The holy grail: a message is delivered and processed exactly one time, regardless of retries, crashes, or network failures. No data is lost, and no duplicates appear. In practice this is implemented through a combination of idempotent producers, broker-level deduplication, and transactional consumers.
How Kafka implements it: Kafka's exactly-once semantics (EOS) use three mechanisms together:
- Idempotent producer — each message gets a sequence number; the broker deduplicates re-sent messages within a session.
- Transactions — a producer can atomically write to multiple partitions; either all writes commit or none do.
- Read-process-write atomicity — the consumer commits its offset and the processed output in a single atomic transaction, so a crash mid-processing leaves the system in a consistent state.
Trade-offs: Exactly-once is 20–30% slower than at-least-once in Kafka benchmarks due to the two-phase commit overhead. It also requires the entire pipeline to participate — if the consumer writes to an external database that does not support distributed transactions, true exactly-once becomes impossible at the system boundary.
Real-world example: A payment ledger consuming from Kafka. Each event represents a debit or credit. Delivering it twice would corrupt account balances; losing it would cause unexplained discrepancies. Kafka EOS with a transactional database (PostgreSQL) write is the right answer here.
How Failures Trigger Duplicates — The ACK Gap
The most common cause of duplicates under at-least-once semantics is the ACK gap: the window between when the consumer finishes processing and when it acknowledges the broker.
This gap is unavoidable in distributed systems. The solution is not to eliminate the gap but to make your consumer idempotent — designed so that processing the same message twice has the same effect as processing it once. Idempotency is covered in depth in Lesson 6.
Choosing the Right Guarantee
Pick based on what your business can tolerate:
- At-most-once — use when losing messages is acceptable and throughput or latency is the priority. Metrics, telemetry, real-time recommendations, live sports scores.
- At-least-once — use when you cannot lose data and can make your consumer idempotent. This is the correct default for most production systems: order processing, email delivery, inventory updates.
- Exactly-once — use only when both loss and duplication are genuinely unacceptable and the entire pipeline supports it. Financial ledgers, billing systems, audit logs where idempotency at the application layer is impractical.
Quick Reference
| Semantic | Data Loss? | Duplicates? | Complexity | Use When |
|---|---|---|---|---|
| At-most-once | Possible | Never | Low | Metrics, telemetry |
| At-least-once | Never | Possible | Medium | Orders, emails (idempotent) |
| Exactly-once | Never | Never | High | Financial ledgers, billing |