The CAP Theorem
The CAP Theorem
In 2000, Eric Brewer proposed what became known as the CAP theorem: a distributed data store can provide at most two of the following three guarantees at the same time — Consistency, Availability, and Partition Tolerance. Two years later, Gilbert and Lynch proved this formally. Understanding CAP is not about picking a checkbox; it is about reasoning clearly about what your system will do when things go wrong.
The Three Guarantees
- Consistency (C): Every read receives the most recent write — or an error. All nodes see the same data at the same time. This is the "single-system illusion": a cluster of machines behaves as if it were one perfectly synchronized machine.
- Availability (A): Every request receives a response (not an error), but the data returned might not be the latest. The system stays operational even if some nodes are down.
- Partition Tolerance (P): The system continues to operate despite arbitrary message loss or delay between nodes. A network partition means that some nodes cannot reach others — messages are dropped or indefinitely delayed.
Why Partition Tolerance Is Non-Negotiable
Network partitions happen. Hardware fails, switches misbehave, data-center links saturate. In a real distributed system, you must tolerate partitions — dropping P is only feasible for single-node or tightly-coupled systems where you control the entire network (and even then, you are just making the partition probability very small, not zero). So the real choice is: when a partition occurs, do you sacrifice C or A?
CP Systems — Consistency Over Availability
A CP system refuses to respond (returns an error or timeout) rather than return potentially stale data. During a partition, the minority partition shuts down writes — or all writes — until quorum is restored.
- HBase, Apache ZooKeeper: ZooKeeper's leader-election protocol guarantees that only one node answers at a time. If a follower cannot reach the leader, it refuses to serve reads.
- MongoDB with
writeConcern: majority: A write must be acknowledged by a majority of replica-set members before it is confirmed. During a partition where the primary cannot reach enough secondaries, writes block. - Etcd: Used by Kubernetes for configuration storage; prioritizes consistency — a quorum must agree before any read or write succeeds.
When to choose CP: financial ledgers, inventory counts, seat reservations, any domain where showing stale data is worse than showing an error (e.g., overselling airline seats).
AP Systems — Availability Over Consistency
An AP system keeps responding to every request even during a partition, but different nodes might return different (stale) values. Once the partition heals, nodes reconcile — this is the foundation of eventual consistency.
- Apache Cassandra: With a replication factor of 3 and
consistency level ONE, any single node can answer. During a partition, two sides of the cluster independently serve reads and writes, then sync when connectivity returns. - Amazon DynamoDB (default): Eventually consistent reads are the default; strongly consistent reads are opt-in and cost twice as many read capacity units.
- DNS: A canonical AP example — DNS propagation takes minutes to hours, yet DNS servers always answer. You might get a stale IP for a short window.
When to choose AP: shopping carts, social media timelines, product catalogs, view counters — domains where slight staleness is acceptable and downtime is more damaging than inconsistency.
The CA Corner — A Special Case
Textbooks list CA systems (Consistent + Available, no Partition Tolerance) and give traditional relational databases as examples. This is true in a single-node setup — a standalone PostgreSQL instance is both consistent and available because there is no partition. But as soon as you introduce replication (read replicas, multi-AZ standby), partitions become possible. Calling a distributed system "CA" is usually wishful thinking. In practice, choose between CP and AP.
Beyond the Binary — PACELC
CAP only talks about partition-time behavior. In 2012, Daniel Abadi extended it with the PACELC model: "If there is a Partition, choose between Availability and Consistency; Else (during normal operation), choose between Latency and Consistency." Even without a partition, strong consistency requires coordination between nodes, which adds latency. Cassandra (AP/EL) can be tuned for lower latency at the cost of consistency. DynamoDB global tables let you tune per-table. This is why many modern systems expose tunable consistency levels rather than a fixed pick.