Strong Consistency & Quorums
Strong Consistency & Quorums
Eventual consistency is a practical tool, but some operations simply cannot tolerate stale reads. A bank transfer, an inventory reservation, or a distributed lock must reflect the latest committed state every time. Strong consistency guarantees exactly that: every read observes the most recent successful write, regardless of which node you ask. The question is how to achieve it across a cluster of N replicas without requiring every node to be healthy.
Introducing the Quorum
A quorum is the minimum number of nodes that must participate in an operation before it is considered successful. Two quorum numbers govern a replicated system:
W— the number of nodes that must acknowledge a write before the coordinator returns success to the client.R— the number of nodes that must respond to a read before the coordinator returns a value to the client.
The cluster has N total replicas. The golden rule is:
A typical production choice for N = 5 is W = 3, R = 3 (W + R = 6 > 5). Because 3 nodes acknowledged the write and 3 nodes are queried on the read, at least one node must be in both sets.
Why the Overlap Guarantees Freshness
Imagine you wrote value v2 to nodes A, B, and C (W = 3). Later, a read contacts nodes A, D, and E (R = 3). Node A is in both sets. It returns v2. The coordinator picks the value with the highest version stamp, so the client always receives v2 — never the stale v1 that only D and E know about.
Common Quorum Configurations
Different W/R combinations trade latency for consistency, all satisfying W + R > N:
- Read-heavy (W=N, R=1): Every node must acknowledge writes (slow writes), but any single node can serve reads (fast reads). Used when reads vastly outnumber writes and write latency is acceptable.
- Write-heavy (W=1, R=N): Writes are acknowledged immediately by one node (fast, low-durability), but reads must contact all nodes (expensive). Rarely used in practice.
- Balanced (W = R = ⌈(N+1)/2⌉): The most common setup. For N=5, both W and R are 3. Tolerates up to
N−Wnode failures on writes and up toN−Rfailures on reads.
Sloppy Quorums and Hinted Handoff
When nodes are temporarily unreachable, a strict quorum would reject the write and return an error to the client. Some systems (DynamoDB, Cassandra) implement a sloppy quorum: if the target replica is down, a different healthy node temporarily accepts the write and stores a "hint" that it must forward the data when the original node recovers. This keeps writes available at the cost of temporarily weakening the consistency guarantee — the owning node may serve a stale read until the hinted handoff completes.
Version Vectors and Conflict Resolution
In a quorum read, multiple replicas may return different versions of the same key. The coordinator must pick the winner. Two strategies:
- Last Write Wins (LWW): Each value carries a timestamp or monotonic counter. The highest wins. Simple, but clock skew on distributed nodes can discard a valid newer write.
- Version Vectors: Each node maintains a per-replica counter. The coordinator merges vectors and detects true conflicts (concurrent writes). The application resolves them (e.g., CRDTs, user-driven merge). Used by systems like Riak and DynamoDB in its vector-clock mode.
Real-World Implementations
In Apache Cassandra, you set the replication factor at the keyspace level and the consistency level per query. CONSISTENCY QUORUM on both reads and writes with RF=3 gives W=2, R=2, satisfying W + R = 4 > 3. CONSISTENCY ALL requires all replicas, eliminating staleness but reducing availability.
etcd, the Kubernetes configuration store, uses the Raft consensus algorithm internally. Raft is essentially a quorum protocol where the leader must receive acknowledgements from a majority of nodes before committing an entry. With five etcd nodes, a leader needs three (majority) to confirm a write — identical to a quorum of W=3 out of N=5.
Latency vs. Consistency
The core cost of a strict quorum is latency. A write does not return until W replicas respond, and a read waits for R. If one of those replicas is on a different availability zone with a 20 ms cross-zone round trip, your P99 write latency is dominated by the slowest of the W acknowledgements. This is why tuning W and R is a business decision: a shopping cart can tolerate eventual consistency (lower latency, higher availability), while a payment ledger demands strict quorum (higher latency, absolute correctness).
N=3, W=2, R=2. This is the sweet spot for most databases: tolerates one node failure on both reads and writes, adds only one extra network hop versus a single-node write, and satisfies the quorum rule comfortably (4 > 3).
Quorums Are Not a Silver Bullet
Even with W + R > N, linearizability is not automatic. If two concurrent writes race, the coordinator may see different values from different nodes in the same read round. Systems that need true linearizability (no anomalies whatsoever) must layer a consensus protocol (Raft, Paxos) on top — quorums alone only guarantee that a fully committed write is visible, not that concurrent writes are ordered correctly. We cover Raft and Paxos in Lesson 7.