Asynchronous Processing & Messaging

Delivery Guarantees: At-Most-Once, At-Least-Once & Exactly-Once

18 min Lesson 5 of 10

Delivery Guarantees: At-Most-Once, At-Least-Once & Exactly-Once

Every distributed messaging system must answer one fundamental question: when a producer sends a message, how many times will the consumer receive and process it? The answer is almost never "exactly once" by default — it depends on a spectrum of guarantees, each with concrete trade-offs in latency, throughput, complexity, and correctness. Getting this wrong can mean lost financial transactions, duplicate order confirmations, or corrupted analytics.

There are three standard delivery semantics. Understanding them — and knowing which one you actually need — is one of the most important decisions in async system design.

At-Most-Once Delivery

The broker fires and forgets. A message is delivered zero or one time — it will never be delivered twice, but it might be lost entirely. The producer sends the message without waiting for acknowledgment; the broker discards the message immediately after a single delivery attempt regardless of whether the consumer succeeded.

Where it appears: UDP-based telemetry, fire-and-forget log shipping, real-time metrics (where losing a single data point is acceptable), IoT sensor readings where stale data is worthless anyway.

Trade-offs: Lowest latency, highest throughput, zero overhead for deduplication. But data loss is a real possibility on any network blip or consumer crash. Use this only when losing messages is cheaper than the cost of reliability.

Real-world example: A game analytics pipeline emitting 500,000 events/second for player telemetry. Losing 0.1% of those events in a network spike has negligible impact on aggregate reports. The overhead of acknowledgment and retry would halve throughput for no practical gain.

At-Least-Once Delivery

The broker retries until it receives an acknowledgment. A message is delivered one or more times — no data is lost, but duplicates are possible. The consumer must explicitly acknowledge (ACK) each message. If the broker does not receive an ACK within a timeout, it re-enqueues the message and delivers it again.

Where it appears: RabbitMQ with ack enabled, Kafka with acks=all and consumer offset not committed until processing is complete, SQS standard queues, most enterprise job queues.

Trade-offs: Strong durability guarantee — messages survive broker and consumer crashes. But duplicates are inevitable: network timeouts can cause re-delivery even after the consumer successfully processed the message (it crashed before it could ACK). Your consumer logic must be idempotent (covered in Lesson 6).

Real-world example: An order-processing service. If the consumer crashes after charging the card but before ACKing the message, the broker re-delivers. The consumer must detect the duplicate (e.g., by checking if an order with that ID already exists) rather than charging the card a second time.

Three delivery guarantee semantics compared At-Most-Once At-Least-Once Exactly-Once Producer Broker Consumer send deliver (maybe) ⚠ may lose no duplicate Producer Broker Consumer send + ACK deliver 1+ ✓ no loss ⚠ duplicates Producer Broker (+ dedup store) Consumer (transactional) idempotent send deliver exactly 1 ✓ no loss ✓ no duplicate
Side-by-side comparison of the three delivery semantics — arrows show what the consumer receives for a single message sent by the producer.

Exactly-Once Delivery

The holy grail: a message is delivered and processed exactly one time, regardless of retries, crashes, or network failures. No data is lost, and no duplicates appear. In practice this is implemented through a combination of idempotent producers, broker-level deduplication, and transactional consumers.

How Kafka implements it: Kafka's exactly-once semantics (EOS) use three mechanisms together:

  1. Idempotent producer — each message gets a sequence number; the broker deduplicates re-sent messages within a session.
  2. Transactions — a producer can atomically write to multiple partitions; either all writes commit or none do.
  3. Read-process-write atomicity — the consumer commits its offset and the processed output in a single atomic transaction, so a crash mid-processing leaves the system in a consistent state.

Trade-offs: Exactly-once is 20–30% slower than at-least-once in Kafka benchmarks due to the two-phase commit overhead. It also requires the entire pipeline to participate — if the consumer writes to an external database that does not support distributed transactions, true exactly-once becomes impossible at the system boundary.

Real-world example: A payment ledger consuming from Kafka. Each event represents a debit or credit. Delivering it twice would corrupt account balances; losing it would cause unexplained discrepancies. Kafka EOS with a transactional database (PostgreSQL) write is the right answer here.

Key insight: "Exactly-once delivery" as a network primitive is theoretically impossible without coordination. What modern systems call "exactly-once" is really at-least-once delivery + idempotent processing — the broker might deliver more than once, but the consumer is designed so that processing the same message twice produces the same result as processing it once.

How Failures Trigger Duplicates — The ACK Gap

The most common cause of duplicates under at-least-once semantics is the ACK gap: the window between when the consumer finishes processing and when it acknowledges the broker.

The ACK gap: crash between processing and acknowledgment causes re-delivery time Broker delivers t=0 Consumer processes ✓ t=50ms CRASH before ACK sent t=80ms Broker re-delivers (timeout expired) t=30s ACK gap duplicate window
The ACK gap: the consumer processes the message at t=50ms but crashes at t=80ms before sending the ACK. The broker re-delivers at t=30s, causing a duplicate.

This gap is unavoidable in distributed systems. The solution is not to eliminate the gap but to make your consumer idempotent — designed so that processing the same message twice has the same effect as processing it once. Idempotency is covered in depth in Lesson 6.

Choosing the Right Guarantee

Pick based on what your business can tolerate:

  • At-most-once — use when losing messages is acceptable and throughput or latency is the priority. Metrics, telemetry, real-time recommendations, live sports scores.
  • At-least-once — use when you cannot lose data and can make your consumer idempotent. This is the correct default for most production systems: order processing, email delivery, inventory updates.
  • Exactly-once — use only when both loss and duplication are genuinely unacceptable and the entire pipeline supports it. Financial ledgers, billing systems, audit logs where idempotency at the application layer is impractical.
Practical default: Design for at-least-once and make your consumers idempotent. This gives you near-zero data loss with manageable complexity, and works with virtually every message broker. Reserve exactly-once for cases where you can prove idempotency is architecturally impossible.
Do not trust broker labels blindly. Many services advertise "exactly-once" but deliver it only within a single partition, within a single session, or only at the broker layer — not end-to-end. Always verify what guarantee applies at every hop: producer → broker, broker → consumer, consumer → downstream storage. A chain is only as strong as its weakest link.

Quick Reference

Semantic Data Loss? Duplicates? Complexity Use When
At-most-once Possible Never Low Metrics, telemetry
At-least-once Never Possible Medium Orders, emails (idempotent)
Exactly-once Never Never High Financial ledgers, billing