Caching & CDNs

Consistency & Caching

18 min Lesson 9 of 10

Consistency & Caching

Every cache introduces a second copy of your data. The moment two copies exist, they can diverge — and when they do, users see stale results, billing systems charge wrong amounts, or inventory counters go negative. Consistency is the discipline of keeping those copies in sync. This lesson explores the consistency models you must reason about, the specific failure modes caching introduces, and the concrete strategies engineers use at companies like Amazon, Netflix, and Slack to keep caches and source-of-truth databases aligned.

The Fundamental Tension

Caching and strict consistency are in direct conflict. A cache exists precisely because going to the database on every request is expensive. But every moment the cache holds a value without checking the database is a moment the value could be stale. The engineering challenge is choosing how much staleness you can tolerate — and making that decision explicit, per data type, rather than accidentally.

Key Insight: Consistency is not a binary property. Different data in the same system can have different staleness tolerances. A product price might be acceptable at 60-second staleness; a bank balance must be zero-staleness (always read from the primary database, never from cache).

Consistency Models in Distributed Systems

Before examining caching specifically, it helps to have names for the consistency levels you can aim for:

Strong consistency: Every read returns the most recently written value. No stale reads ever. Cost: every read must check (or go to) the primary store, which eliminates most of the performance benefit of caching.
Eventual consistency: If no new writes occur, all copies will converge to the same value — eventually. Stale reads are possible in the window between a write and cache propagation. Most large-scale caching systems operate here.
Read-your-own-writes: A user always sees their own most recent write, even if other users might see a slightly older value. This is the minimum consistency bar for most user-facing features (a comment you just posted must appear to you immediately).
Monotonic reads: Once you read a value at version V, you will never read a value older than V on subsequent reads. Prevents the jarring experience of a page going "backwards" on refresh.

Consistency models on a spectrum — from eventual (cache-friendly, high performance) to strong (must bypass cache, full DB cost).

How Caches Break Consistency

Three concrete failure modes occur in real systems:

Write-after-read (stale reads): User A reads product stock: cache returns 5. User B buys 5 items, writing stock = 0 to the database. User A's cache entry has a 30-second TTL. For the next 29 seconds, any user whose request lands on that cached entry sees stock = 5 and can attempt a purchase that will fail at checkout.
Cache stampede on invalidation: You delete a popular cache key after a write. Hundreds of concurrent requests simultaneously miss the cache, all hit the database, all compute the same value, and all attempt to write it back. The database is hammered and the cache is repopulated with near-identical redundant work. This is sometimes called a thundering herd.
Partial update visibility: A user updates their profile (name + avatar). The write goes to the database. The name cache key is invalidated immediately; the avatar cache key expires only in 10 minutes. For 10 minutes, queries for this user see the new name but the old avatar — an internally inconsistent view of the same entity.

Pitfall — Silent Staleness: The most dangerous inconsistency is the kind nobody notices during testing. Your staging environment has low traffic, so cache TTLs expire quickly and everything looks correct. In production, high traffic keeps keys perpetually warm — a stale value from 10 minutes ago is served millions of times before TTL expires. Always test with realistic TTLs under realistic load.

Strategy 1 — Write-Through with Atomic Invalidation

In a write-through cache, every write updates both the database and the cache in the same logical operation. When combined with atomic cache invalidation (delete the cache entry on every write), the window of staleness is reduced to the inter-process propagation delay — typically sub-millisecond on a local Redis cluster.

The catch: atomic update of two systems (database + cache) without a distributed transaction means you must handle partial failures. The recommended pattern is write database first, then delete cache key — not write cache first. Reasoning: if the cache delete fails, the worst outcome is a stale cache read; the database (source of truth) is correct and the stale key will expire. If you wrote the cache first and the database write fails, you now have a cache serving data that was never persisted.

Best Practice — Delete, Do Not Update: On a write, delete (invalidate) the cache entry rather than updating it. The next read will repopulate from the fresh database value. Updating-in-place requires you to compute the new cache value in the same code path as the write, which couples your caching logic tightly to your write logic and creates correctness bugs when the logic evolves.

Strategy 2 — Cache-Aside with Version Keys

In cache-aside (lazy loading), the application manages the cache explicitly: check cache → on miss, load from DB → store in cache. To address the partial update visibility problem, store entity data under a versioned key, e.g. user:{id}:v{version}, where version is an integer incremented on every write. When you update the user, increment their version in the database. The next cache lookup will use the new key, miss, and repopulate atomically from the database. Old versioned keys expire naturally via TTL. No partial-update window exists because all fields for a version are stored as one cache entry.

Strategy 3 — Event-Driven Invalidation (CDC)

For large, complex systems where writes come from many services, maintaining cache invalidation logic inside every write path becomes fragile. A more scalable approach is Change Data Capture (CDC): a background process reads the database's replication log (e.g. MySQL binlog via Debezium), and for every row change, publishes an event to a message bus (e.g. Kafka). Cache invalidation workers subscribe to these events and delete or refresh the relevant cache keys. This decouples caching from application code entirely.

CDC-based cache invalidation: the database replication log drives cache deletes without coupling invalidation logic to application write code.

The Read-Your-Own-Writes Problem

A classic scenario: a user updates their profile picture, is redirected to their profile page, and sees the old photo. This happens because the write hit the primary database, the redirect fetched the profile from a cache that was not yet invalidated, and the TTL was still 5 minutes. Solutions in order of complexity:

Invalidate aggressively on write: Delete the cache entry immediately after every write. Simple and usually sufficient for user-owned data.
Route post-write reads to the primary: For a short window after a write (e.g., 5 seconds), bypass the cache and read from the primary database. Store a short-lived per-user cookie or session flag indicating "has a pending write." Amazon DynamoDB and MySQL RDS proxy do this automatically.
Per-user cache namespacing: Partition cache keys by user session so that a user's reads never land on a key written by another process path. Heavier on cache memory but eliminates this entire class of bug.

Quantifying the Consistency Budget

A useful mental model is to assign an explicit consistency budget to each data type — the maximum acceptable staleness in seconds. Common real-world values:

Account balance, payment status: 0 seconds — never cache; always read from primary.
Inventory stock count: 5–10 seconds — short TTL; invalidate on purchase.
Product price: 30–60 seconds — prices change infrequently; a short TTL is acceptable.
User profile (public, read by others): 60–300 seconds — acceptable; user posts infrequently.
Global configuration / feature flags: 30–120 seconds — low write frequency; cache aggressively.
Homepage hero content / CMS pages: 300–3600 seconds — extremely low write rate; cache for minutes or hours.

Document these values in your system design; they drive TTL choices, invalidation strategy selection, and capacity planning for your cache tier. Treat the consistency budget as a first-class requirement, not an afterthought.

Interview Tip: In a system design interview, proactively ask "What is the acceptable staleness for this data?" before choosing a caching strategy. Interviewers reward candidates who reason about consistency explicitly rather than defaulting to "just put everything in Redis with a 5-minute TTL."

Consistency in Multi-Region Deployments

When your system spans multiple geographic regions, consistency is harder. A write to the US-East primary database propagates to the EU replica in ~120 ms. Any cache in the EU region that is invalidated before replication completes will, on the next read miss, repopulate from a stale replica. Strategies:

Hold the cache invalidation for a configurable replication lag margin (e.g., 200 ms) before publishing the delete event.
Version-stamp cached values with a database LSN (log sequence number); reject repopulation from replicas whose LSN is behind the write's LSN.
For strongly-consistent data, route all reads to the primary — accept the cross-region latency cost.

Consistency and caching are inseparable concerns. Every caching decision is simultaneously a consistency decision. Building robust systems requires you to be explicit, per data type, about which consistency model you are accepting — and to design your invalidation, TTL, and write path accordingly.