Idempotency & Deduplication
Idempotency & Deduplication
In any distributed messaging system, the same message can arrive more than once. A producer retries after a timeout; a broker crashes mid-acknowledgement; a network partition causes a delivery to be confirmed on one side but not the other. These are not edge cases — they are everyday realities at scale. The solution is to design every consumer to be idempotent and to use deduplication wherever idempotency alone is not enough.
What Idempotency Means
An operation is idempotent if performing it multiple times produces exactly the same result as performing it once. The term comes from mathematics: applying the same function repeatedly leaves the state unchanged after the first application.
Concrete examples:
- Idempotent: Setting a user's email to
alice@example.com. Running theUPDATEten times leaves exactly the same row. - Not idempotent: Debiting $10 from a wallet. Running that operation ten times charges $100.
- Idempotent: Marking an order as
SHIPPED. Transitioning fromSHIPPED→SHIPPEDis a no-op. - Not idempotent: Incrementing a view counter. Each duplicate message inflates the count.
Idempotency Keys
The standard pattern is to attach a stable, unique idempotency key to every message at the point of creation — typically a UUID generated by the producer. The consumer records which keys it has already processed. On receiving a message, it checks: have I seen this key before? If yes, it acknowledges and discards without re-executing the side effect.
The key must be:
- Globally unique — a UUID v4 or a domain-scoped composite key such as
order_id:event_type:attempt. - Stable across retries — the producer must send the same key on every retry of the same logical operation, not generate a fresh UUID each time.
- Stored durably — in Redis with a TTL, or in a database table, so the check survives consumer restarts.
Where to Store Processed Keys
The dedup store is a critical component. Common choices:
- Redis
SET NXwith TTL — fast (sub-millisecond), works for keys that can expire (e.g., 24 h). UseSET key 1 EX 86400 NX: returns 1 on first write (process), 0 on duplicate (skip). This is the most common choice in high-throughput systems. - Database unique constraint — a table with a
UNIQUEindex on the idempotency key. AnINSERT IGNOREorON CONFLICT DO NOTHINGatomically prevents double-processing. Slightly slower but durable and transactional — you can update the business row and insert the key in one transaction. - Broker-level deduplication — some brokers (AWS SQS FIFO, RabbitMQ with a dedup plugin) accept a
MessageDeduplicationIdand reject duplicates within a window (5 minutes for SQS FIFO) at the broker itself. This offloads the logic but does not protect against application-level replays outside that window.
Atomicity: The Double-Spend Problem
A subtle but critical correctness issue arises if you check the dedup store and execute the business logic as two separate steps. Between those two steps another thread could process the same message. The solution is an atomic check-and-set:
- With Redis:
SET key 1 EX 86400 NXis a single atomic command — no race condition. - With a relational database: wrap the key insert and the business operation in a single transaction. The unique constraint will cause the second transaction to fail and roll back cleanly.
SELECT → decide → INSERT as three separate statements without a transaction or atomic primitive. Under concurrent load, two consumers can both pass the SELECT check and both proceed to execute, producing a duplicate. This is a classic TOCTOU (time-of-check/time-of-use) race.
Making Non-Idempotent Operations Safe
When you cannot rewrite the underlying operation to be naturally idempotent (e.g., charging a payment gateway), the pattern is fence with a status machine:
- Before calling the external service, write a record to the database with status
PENDINGand the idempotency key, inside a transaction. - If that write fails with a unique-key violation, another worker has started or completed the operation — stop and return.
- Call the external service, then update the record to
COMPLETEwith the result. - On any crash between steps 2 and 3, a recovery job sees the
PENDINGrow and retries — passing the same idempotency key to the external API (Stripe, PayPal etc. all support this) so the payment gateway deduplicates on its side.
TTL and Key Expiry
Dedup keys do not need to live forever. Choose a TTL that covers your retry window with a comfortable margin. If your retry policy gives up after 1 hour with exponential back-off, a 24-hour TTL is safe. For payment operations where a customer might dispute a charge days later, keep keys for 7–30 days. When keys expire, you accept that a message replayed after the window will be processed again — make sure that is acceptable for your use case, or use permanent database records for high-stakes operations.
Practical Checklist
- Every producer assigns a UUID idempotency key once and reuses it on all retries.
- Every consumer checks the key atomically before executing the side effect.
- The dedup store uses an atomic primitive (
SET NXor unique constraint). - Business logic and key recording happen in the same transaction where possible.
- TTL is set long enough to cover the full retry window, plus margin.
- External services (payment gateways, email providers) receive the same key so they can deduplicate independently.