Caching & CDNs

Caching Pitfalls

18 min Lesson 8 of 10

Caching Pitfalls

Caching dramatically improves performance — but it introduces a class of failure modes that can be more damaging than having no cache at all. Three problems recur at scale across almost every production system: the cache stampede, the thundering herd, and the twin challenges of hot keys and stale data. Understanding them at a mechanical level is the difference between a cache that saves your database and one that destroys it.

Cache Stampede (Dog-piling)

Imagine a popular product page cached with a 60-second TTL. The moment the TTL expires, thousands of concurrent requests all check the cache, get a miss, and simultaneously fire an identical database query. The database — designed to serve a few dozen such queries per second — receives thousands at once. Response times spike, connection pools exhaust, and the cascading slowdown causes more requests to pile up behind the ones already waiting. This is a cache stampede, also called dog-piling.

The key characteristic is that the problem is caused by the cache, not by an absence of it. Without a cache the database would always bear this load; with a cache it normally bears almost none, so its capacity is provisioned accordingly — making it especially vulnerable when the cache suddenly fails to protect it.

Cache Stampede: TTL Expiry Triggers Mass Database Queries Time Cache WARM — requests served from cache TTL Expires Cache MISS — stampede Req 1 Req 2 Req 3 5,000 more... Database OVERLOADED Cache (MISS - expired) 5,000+ simultaneous queries Solutions: 1. Mutex/Lock: only one thread recomputes; others wait then read the refreshed value. 2. Probabilistic Early Recomputation: refresh before TTL hits zero (XFetch algorithm).
When a cached key expires, thousands of concurrent requests simultaneously miss and hammer the database.

Fixing the Stampede

Three proven mitigations exist, and they are often layered:

  • Mutex / distributed lock: When a miss is detected, one thread acquires a lock and recomputes the value. All other threads either wait for the lock to release and then read the freshly-populated cache, or serve a slightly stale value while the recomputation runs. Redis SET NX PX (set if not exists, with expiry) is the standard primitive.
  • Probabilistic Early Recomputation (XFetch): Instead of waiting for TTL = 0, each request computes a probability of refreshing the cache that increases as the expiry approaches. Popular keys are refreshed slightly early by random traffic, smoothing out the expiry cliff. The formula is: -delta * beta * ln(random()) where delta is the time to recompute and beta controls aggression (1.0 is standard).
  • Background refresh: A dedicated async job proactively re-warms keys before they expire. Simple and predictable, but requires operational overhead (a scheduler, monitoring the refresh job).
Best Practice: For high-traffic keys, combine a short TTL (e.g. 30 s) with a background refresh job that runs every 25 s. The cache is always warm; the database is hit at a controlled, predictable rate — not by a sudden crowd.

Thundering Herd

The thundering herd is closely related but broader. It describes any situation where a large number of processes or threads simultaneously wake up and contend for the same shared resource. In caching it happens most visibly at cold start: when a caching tier is restarted (e.g., after a Redis failover), the cache is completely empty. The first wave of incoming requests all miss, overload the database, cause slow responses, and trigger retries — which adds even more load. In the worst case the database crashes before the cache has a chance to warm up, creating a death spiral.

The same pattern emerges when a cache cluster scales out and new nodes join with empty partitions. Keys that were on the old nodes must be re-fetched from the database all at once.

Mitigations include cache warm-up scripts (pre-load the top-N most-accessed keys from a pre-computed list before routing traffic to a new node), request coalescing (collapse duplicate in-flight requests into a single upstream fetch — Nginx proxy_cache_lock does exactly this), and circuit breakers that shed load during a cold-start window rather than letting the database absorb it all.

Hot Keys

A hot key is a single cache entry (or a small group of entries) that receives a disproportionate share of traffic. Consider a live sports score, a celebrity profile, or the front page of a news site during a breaking story. Millions of requests per minute may converge on a single key. Even though the key is served from cache, if a single cache node owns that key, that node can saturate its CPU or network bandwidth — becoming a bottleneck even though the database is not involved at all.

Hot Key Problem and Local In-Process Cache Solution Before: Single Hot Key Node App Server 1 App Server 2 App Server 3 App Server 4 Cache Node key: "trending" SATURATED After: Local In-Process Cache App 1 + L-Cache App 2 + L-Cache App 3 + L-Cache App 4 + L-Cache Redis Cache (fallback only) served locally served locally served locally served locally
Left: all servers hammer one saturated cache node. Right: an in-process local cache absorbs nearly all requests; Redis only handles misses.

Solutions for hot keys include:

  • Local (in-process) cache: Store a copy of the hot key in each application server's memory. Requests never leave the process. The trade-off is that each server independently refreshes the key, so a short TTL (1–5 s) is necessary to keep values roughly consistent across instances.
  • Key replication: Some distributed caches allow you to write a hot key to multiple nodes under different names (e.g., trending:0, trending:1, trending:2) and read from a random replica. This spreads the read load while keeping data consistent because all replicas are written on every update.
  • Read-through with request coalescing: At the cache node level, collapse many simultaneous reads for the same key into a single upstream fetch, then fan the response back to all waiting callers.

Staleness

Every cache introduces the risk of serving outdated data. Staleness is not always a bug — serving a product price that is 10 seconds old is usually acceptable; serving an account balance that is 30 minutes old is not. The danger is silent staleness: data that is expired in the business sense but not in the TTL sense, because the TTL was set too long or an invalidation event was missed.

Common staleness traps include:

  • TTL set-and-forget: A developer sets TTL = 3600 during early development when traffic is low. As the system grows, a lot changes in an hour — but no one revisits the TTL.
  • Fan-out invalidation gaps: A single piece of data (e.g., a user name) is cached under multiple keys or in multiple layers (CDN + Redis + local). An update clears one key but misses the others.
  • Write-behind lag: In write-behind (write-back) caching, writes are batched and flushed asynchronously. If the application crashes before the flush, the cache diverges from the database permanently.
Warning — Fixing Staleness Can Cause a Stampede: If you shorten TTLs across the board to reduce staleness, you increase the frequency of cache misses, which raises the probability of a stampede. Always pair shorter TTLs with stampede protection (mutex or early recomputation).

Compound Failures

In practice, pitfalls compound. A hot key expires (stampede) while a new cache node is joining the cluster with empty partitions (thundering herd), and the data being sought is also write-behind with unflushed changes (staleness). Each pitfall alone is manageable; all three together can cascade into a total outage.

The defensive design principle is graceful degradation: if the cache is unavailable or a stampede is in progress, the system should shed non-critical load, serve stale-but-acceptable data from a fallback, and never let cache failure translate directly into database failure. That separation of failure domains is what makes caching safe at scale.

Key Insight: Caching pitfalls are fundamentally coordination problems. Multiple independent agents (servers, threads) each make individually rational decisions (query the database on a miss) that collectively produce an irrational outcome (database overload). The fixes — locks, probabilistic refresh, local caches — are all forms of coordination that break the collective irrationality.