Project: Add a Caching Layer
Project: Add a Caching Layer
Throughout this tutorial we have studied caching theory — strategies, eviction policies, invalidation, distributed stores, CDNs, and pitfalls. Now we bring it all together. This capstone lesson walks you through the complete design process for adding a caching layer to a real-world read-heavy system: a social content platform (think a news feed, product catalogue, or blog aggregator) that serves 50 million requests per day and is struggling with database saturation at peak hours.
We will work through six stages: understand the workload, identify what to cache, choose strategies, design the topology, handle invalidation, and verify the result with concrete numbers. By the end you will have a complete, defensible cache architecture diagram.
Stage 1 — Understand the Workload
Before writing a single line of cache config, measure what you actually have. For our platform the profiling reveals:
- 50 M requests/day (~580 req/s average, ~2,500 req/s at peak)
- Read/write ratio: 95:5 — 95% of operations are reads
- Top 5 endpoints by DB query count: feed (home timeline), post detail, user profile, search suggestions, trending topics
- Slowest queries: feed aggregation (~120 ms), full-text search (~80 ms), recommendation engine (~200 ms)
- Data freshness requirement: feeds tolerate up to 60 s of staleness; trending topics up to 5 min; user profiles up to 30 s
Stage 2 — Identify Cache Candidates
Not everything deserves to be cached. Apply the three-question filter to every data object:
- Is it read far more often than it is written? (read:write ratio > 10:1 is a good heuristic)
- Is it expensive to compute or fetch? (involves joins, aggregations, external API calls, or ML inference)
- Can it tolerate a short window of staleness? (even 30 seconds of TTL eliminates the thundering herd on a hot row)
Applying these filters to our platform produces the following cache plan:
- Home timeline (feed): Pre-computed per user, stored in Redis. Read thousands of times per session; rebuilt only on new posts. TTL: 60 s + event-driven invalidation on new post.
- Post detail page: Immutable after first publish (body never changes). Cache indefinitely; invalidate only on edit or delete. TTL: 24 h.
- User profile summary: Name, avatar URL, follower count. Changes rarely. TTL: 30 s. Cache at the application layer.
- Trending topics: A ranked list recomputed every 5 minutes by a background job. Cache the latest result; replace on every recomputation. TTL: 5 min.
- Search autocomplete suggestions: A small lookup table per prefix, precomputed nightly. Cache indefinitely; invalidate on nightly rebuild. Served from a CDN edge for zero-latency global delivery.
- Static assets (JS, CSS, images): Immutable per deploy (content-hash in filename). Cache at CDN with
Cache-Control: public, max-age=31536000, immutable.
Stage 3 — Choose Strategies per Object Type
Different data shapes call for different caching strategies. Here is the mapping for our system:
- Feed / timeline: Cache-Aside + Write-Through on fan-out. When a user publishes a post, the write path fans out the post ID to each follower's cached feed list in Redis (write-through). On read, if the feed key is missing (cold start or expired), load from DB and re-populate (cache-aside / lazy load).
- Post detail: Cache-Aside with long TTL. First read populates the cache. Subsequent reads hit cache. On edit, explicitly delete the cache key (invalidate-on-write). On delete, same.
- User profile: Cache-Aside with short TTL. Short TTL (30 s) means stale data self-heals quickly. Explicit invalidation only for critical fields like account suspension.
- Trending topics: Write-Around / Background Refresh. A cron job recomputes the list and writes directly to cache every 5 minutes, bypassing the normal read path. Clients always read from cache; there is never a miss for this key after warm-up.
- Static assets: CDN cache with long TTL + cache-busting by filename hash. New deploy = new filename = cache miss once globally; old URL continues serving the old file from CDN edge until it expires naturally.
Stage 4 — Design the Full Topology
With the strategy per object type decided, we can now draw the full architecture. The system has three distinct caching tiers working in concert:
The three active caching tiers are:
- Tier 1 — CDN Edge: Serves static assets (JS, CSS, images) and cacheable full pages (e.g., public post pages with
Cache-Control: s-maxage=60). Latency: <20 ms anywhere on earth. Hit ratio for static: ~99%. Hit ratio for pages: ~70–80%. - Tier 2 — Application-layer local memory cache (L1): Each app server holds a small in-process cache (e.g., 256 MB using a library like Caffeine in Java or lru-cache in Node). This avoids even the Redis network hop for extremely hot keys (e.g., global config, feature flags, top trending topic). TTL: 5–30 s. Data is per-process, so consistency is relaxed — only use for data where brief inter-node divergence is acceptable.
- Tier 3 — Redis Cluster (L2): The central distributed cache shared by all app servers. Holds user feeds, post objects, user profiles, session tokens, and rate-limit counters. Latency: ~0.5–2 ms. Hit ratio target: >90%.
Stage 5 — Invalidation Design
The topology tells us where things live; invalidation design tells us when and how to make them stale. Here is the per-object invalidation plan:
The key design decisions in this invalidation scheme:
- Event-driven invalidation via a message queue decouples the write path from cache management. The HTTP request that handles the POST does not block waiting for Redis
DELoperations across thousands of follower keys — the fanout happens asynchronously. - Delete, do not update, on fanout. For user timelines with thousands of followers, pushing the new post into each feed in Redis synchronously would take seconds. Deleting the feed key is instantaneous. The next read re-populates the feed lazily.
- CDN purge for public pages. A cached post detail page at the CDN edge must be purged when the post is edited or deleted. Major CDN providers (Cloudflare, Fastly, Akamai) expose a purge API for exactly this purpose. Budget for the extra latency of a CDN purge call (~50–200 ms) in your write path.
Stage 6 — Verify with Numbers
A caching design is only credible if you can show the expected impact. Let us run the numbers for our platform:
- Before: Peak 2,500 req/s, average 5 DB queries per request = 12,500 queries/s hitting the DB. With a single Primary DB capped at ~8,000 qps, the system is already over capacity.
- After (CDN hit rate 70% for dynamic pages): 2,500 × 0.30 = 750 req/s reach the app servers.
- After (Redis hit rate 92% across all object types): 750 × 0.08 = 60 req/s reach the DB for cache misses, plus ~125 req/s for writes = ~185 qps total. The DB is now at <3% of its capacity.
- Response time: CDN hit: ~15 ms. Redis hit: ~3 ms. Cache miss: ~45 ms. Weighted average P50: ~5 ms (down from ~140 ms without cache).
INFO stats (keyspace_hits / keyspace_misses) and alert if hit rate drops below 85%. A sudden drop indicates a cache stampede, a key pattern change, or a misconfigured TTL that is invalidating too aggressively.Summary: The Complete Cache Design Process
Adding a caching layer to a read-heavy system is not a single decision — it is a six-step engineering process:
- Profile the workload — measure read/write ratio, top queries, and latency per endpoint.
- Filter cache candidates — high read ratio, expensive to produce, tolerates brief staleness.
- Assign a strategy per object type — cache-aside, write-through, or background refresh depending on access patterns and consistency requirements.
- Design the tiered topology — browser → CDN → L1 local → L2 distributed cache → DB. Each tier reduces load on the next.
- Design invalidation explicitly — event-driven is better than TTL-only for mutable data; decoupled fanout for high-fan-out writes.
- Verify with numbers and monitor in production — hit ratio, P50/P99 latency, DB qps before and after.
The result for our social content platform: a system that was saturating its database at peak load now runs at under 3% DB capacity with a weighted P50 response time of 5 ms — a 28× improvement — while maintaining strong eventual consistency within a 2-second stale window for feeds and 30 seconds for profiles.