Data Consistency & Replication

Project: Choose a Consistency Strategy

18 min Lesson 10 of 10

Project: Choose a Consistency Strategy

Every concept in this tutorial — CAP, consistency models, quorums, replication topologies, Raft, 2PC, Sagas — converges on a single engineering decision: what consistency guarantee does my system actually need, and how do I implement it without over-engineering? This capstone lesson works through that decision end to end, using a realistic e-commerce platform as the running case study. The goal is a repeatable framework you can apply to any system you design.

The Sample System: ShopFleet

ShopFleet is a mid-scale e-commerce platform: 2 million daily active users, 50,000 orders per day, deployed across three AWS regions (us-east-1, eu-west-1, ap-southeast-1). It has six core services:

Product Catalog — SKU data, images, descriptions
Inventory — real-time stock counts per SKU per warehouse
Cart — session-scoped shopping baskets
Orders — order lifecycle (created → paid → fulfilled → shipped)
Payments — integration with Stripe / bank rails
User Profiles — account settings, saved addresses, preferences

Each service has its own database. There is no shared datastore. This is a classic microservices topology — and the consistency problem lives at the seams between these services.

Step 1 — Classify Each Service by Read/Write Profile and Staleness Tolerance

Before picking a consistency model, you must understand two axes for each service: how stale can reads safely be? and what happens if a write is lost or duplicated?

Mapping ShopFleet services on the staleness-tolerance vs. write-criticality axes. Services in the top-left demand strong consistency; services in the bottom-right are fine with eventual consistency.

Step 2 — Assign a Consistency Model to Each Service

With the matrix in hand, assign a model to each service and justify it with real consequences:

Payments — Linearizable + 2PC

A payment debit and credit must be atomic. If the Payments service charges the customer but the Order service never receives confirmation (or vice versa), the business has a fraud or revenue problem. The correct tool here is 2PC over XA between the Payments DB and the Orders DB — both live in the same region, latency is under 2 ms, and the transaction completes in under 50 ms. The blocking risk of 2PC is acceptable because we run a highly-available coordinator (Atomikos with a replicated coordinator log on PostgreSQL). Internally, the Payments DB uses a synchronous single-leader replication with a quorum write (w = majority) before returning to the coordinator.

Do not use Sagas for payment finalization. A compensating transaction that "refunds" a charge after the fact is not the same as never charging: the customer sees a charge appear and disappear, incurring potential bank holds. For money movement, atomicity at the time of the transaction is non-negotiable.

Inventory — Read-Your-Writes + Quorum Reads

Inventory is trickier than it looks. The real risk is overselling: two concurrent checkout flows reading stock = 1 and both successfully placing an order, leaving stock at −1. The solution is a conditional write (optimistic locking via a version counter) on the Inventory DB, not distributed coordination:

UPDATE inventory
SET quantity = quantity - 1, version = version + 1
WHERE sku_id = 'SKU-9921' AND version = 47 AND quantity >= 1;

If the UPDATE affects 0 rows, the checkout flow retries or surfaces an "out of stock" error. This is a single-node ACID transaction — no 2PC needed. For cross-region reads (showing stock on product pages), a quorum read (r = 2 of 3 replicas) is used, tolerating ~50 ms of stale display data. The actual deduction always happens on the primary.

Orders — Monotonic Reads + Saga Orchestration

An order flows through states: CREATED → PAYMENT_PENDING → PAID → FULFILLMENT_PENDING → SHIPPED → DELIVERED. Each transition touches a different service (Payments, Warehouse, Shipping). This is the ideal use case for an orchestrated Saga: the Order service is the orchestrator; it issues commands, waits for replies, and runs compensating transactions on failure.

For reads, a customer viewing their order history must never see an order "go backwards" (e.g., appear as SHIPPED then revert to PAID). This requires monotonic read consistency — achieved by routing all reads from a given user session to the same replica (session-affinity sticky routing). Sticky routing is implemented at the API gateway with a consistent-hash on the user ID to a replica pool.

Cart — Session Consistency + Eventual

A cart is inherently session-scoped. Nobody else reads your cart; you only care that your writes are visible to your subsequent reads. This is textbook read-your-writes (also called session consistency). Implementation: the Cart service writes to a Redis primary; reads are served from the same primary (or from a replica that has already acknowledged the write, using a replication offset check). Cart data is ephemeral by design — a 24-hour TTL means a lost write is a minor UX annoyance, not a business catastrophe. Eventual consistency with read-your-writes is the right call here.

Product Catalog — Eventual Consistency + CDN Caching

Product descriptions, images, and pricing change infrequently (a few thousand writes per day across 2 million SKUs). Reads vastly outnumber writes. A 30-second stale price on a product page is acceptable — the checkout step performs a fresh authoritative read before confirming the price. The Catalog service uses multi-leader replication across all three AWS regions (each region has a leader) with last-write-wins conflict resolution on the updated_at timestamp. Static assets are pushed to CloudFront (CDN) with a 5-minute TTL. This topology maximises read throughput globally while tolerating brief cross-region divergence.

User Profiles — Eventual Consistency

Profile changes (shipping addresses, display name, email preferences) are low-frequency and the user's own writes are immediately visible via read-your-writes on the primary. Cross-region replication is asynchronous. If a user updates their address in Singapore and immediately makes a purchase in the US region, the old address might be used on that one order — an acceptable edge case. A compensating mechanism (address confirmation email) handles the rare divergence. Eventual consistency with asynchronous replication is correct here.

Complete consistency strategy for ShopFleet: each service has a model, implementation, and staleness SLA matched to its real business requirements.

Step 3 — The Decision Framework (Generalised)

Use this five-question checklist on any service in any system:

What is the cost of a stale read? If the answer is "money lost" or "fraud", choose strong consistency. If the answer is "minor UX glitch", choose eventual.
What is the cost of a lost or duplicated write? If it is irreversible (money moved, stock deducted), use atomic writes with optimistic locking or 2PC. If it is recoverable, idempotency + at-least-once delivery is enough.
Are participants co-located and few in number? 2PC is viable for 2–3 databases in the same data centre. For anything spanning regions or more than ~5 participants, use Sagas.
Is the operation long-running (over 100 ms)? Never use 2PC. Use a Saga with compensating transactions. Lock hold time is proportional to saga step latency — that is the key insight.
What replication lag can the SLA absorb? Translate your business SLA ("users see updated stock within 1 second") into a replication budget. Design read paths (quorum, sync, async) to hit that budget.

The most common mistake is applying the strongest available consistency everywhere — linearizability across all services — because it feels "safe". It is not safe; it is brittle. It couples your availability to the weakest link in your coordination chain. Tailoring consistency per-service is not a compromise; it is engineering discipline.

Always do a "price-check at checkout" fence. Even when using eventual consistency for product display, perform a fresh authoritative read of price and stock against the primary just before confirming an order. This two-tier approach — eventual for browsing, strong for commitment — is the industry-standard pattern used by Amazon, Shopify, and virtually every large e-commerce platform.

Step 4 — Validating Your Design

A consistency strategy is only as good as its failure-mode analysis. For each service, ask: what happens if the primary DB fails right now? and what happens if a cross-region network partition lasts 30 seconds? For ShopFleet:

Payments primary fails: The synchronous replica is promoted (RDS Multi-AZ, ~30 s). In-flight 2PC transactions that did not complete abort cleanly on coordinator timeout. No data loss.
Inventory primary fails: CAS writes fail; checkout surfaces "temporarily unavailable". Display reads fall back to the quorum-read replicas. Recovery in ~30 s.
30-second partition, us-east ↔ eu-west: Product Catalog diverges — EU users may see slightly different prices. The checkout authoritative read fires against the user's region primary, so no incorrect order is placed. Cart writes in EU queue locally and flush on partition heal.

Running these failure scenarios mentally — or better, in a chaos-engineering tool like Gremlin or AWS Fault Injection Simulator — turns a paper design into a validated architecture.

Summary

The right consistency strategy is not a single setting; it is a per-component decision driven by business impact, participant topology, and latency budgets. ShopFleet uses four distinct models across six services: linearizable + 2PC for money, optimistic-lock quorum for stock, monotonic reads + Sagas for order workflow, and eventual consistency for everything user-browsable. This is the normal, correct state of a well-designed distributed system — not a failure to achieve uniformity.

Previous Lesson The Saga Pattern

Back to Course System Design

Back to Course

Project: Choose a Consistency Strategy

Project: Choose a Consistency Strategy

The Sample System: ShopFleet

Step 1 — Classify Each Service by Read/Write Profile and Staleness Tolerance

Step 2 — Assign a Consistency Model to Each Service

Payments — Linearizable + 2PC

Inventory — Read-Your-Writes + Quorum Reads

Orders — Monotonic Reads + Saga Orchestration

Cart — Session Consistency + Eventual

Product Catalog — Eventual Consistency + CDN Caching

User Profiles — Eventual Consistency

Step 3 — The Decision Framework (Generalised)

Step 4 — Validating Your Design

Summary

Tutorial Complete!