Caching & CDNs

Project: Add a Caching Layer

18 min Lesson 10 of 10

Project: Add a Caching Layer

Throughout this tutorial we have studied caching theory — strategies, eviction policies, invalidation, distributed stores, CDNs, and pitfalls. Now we bring it all together. This capstone lesson walks you through the complete design process for adding a caching layer to a real-world read-heavy system: a social content platform (think a news feed, product catalogue, or blog aggregator) that serves 50 million requests per day and is struggling with database saturation at peak hours.

We will work through six stages: understand the workload, identify what to cache, choose strategies, design the topology, handle invalidation, and verify the result with concrete numbers. By the end you will have a complete, defensible cache architecture diagram.

Stage 1 — Understand the Workload

Before writing a single line of cache config, measure what you actually have. For our platform the profiling reveals:

  • 50 M requests/day (~580 req/s average, ~2,500 req/s at peak)
  • Read/write ratio: 95:5 — 95% of operations are reads
  • Top 5 endpoints by DB query count: feed (home timeline), post detail, user profile, search suggestions, trending topics
  • Slowest queries: feed aggregation (~120 ms), full-text search (~80 ms), recommendation engine (~200 ms)
  • Data freshness requirement: feeds tolerate up to 60 s of staleness; trending topics up to 5 min; user profiles up to 30 s
Key Principle: Cache design starts with measurement, not assumption. A query that feels slow may be called rarely; a fast query called millions of times may be the real bottleneck. Profile first, optimize second.

Stage 2 — Identify Cache Candidates

Not everything deserves to be cached. Apply the three-question filter to every data object:

  1. Is it read far more often than it is written? (read:write ratio > 10:1 is a good heuristic)
  2. Is it expensive to compute or fetch? (involves joins, aggregations, external API calls, or ML inference)
  3. Can it tolerate a short window of staleness? (even 30 seconds of TTL eliminates the thundering herd on a hot row)

Applying these filters to our platform produces the following cache plan:

  • Home timeline (feed): Pre-computed per user, stored in Redis. Read thousands of times per session; rebuilt only on new posts. TTL: 60 s + event-driven invalidation on new post.
  • Post detail page: Immutable after first publish (body never changes). Cache indefinitely; invalidate only on edit or delete. TTL: 24 h.
  • User profile summary: Name, avatar URL, follower count. Changes rarely. TTL: 30 s. Cache at the application layer.
  • Trending topics: A ranked list recomputed every 5 minutes by a background job. Cache the latest result; replace on every recomputation. TTL: 5 min.
  • Search autocomplete suggestions: A small lookup table per prefix, precomputed nightly. Cache indefinitely; invalidate on nightly rebuild. Served from a CDN edge for zero-latency global delivery.
  • Static assets (JS, CSS, images): Immutable per deploy (content-hash in filename). Cache at CDN with Cache-Control: public, max-age=31536000, immutable.
Design Tip: Segment your cache by object type, not by endpoint. A single endpoint (e.g., the post detail page) may assemble several cached objects — post body, author profile, comment count — each with different TTLs and invalidation rules.

Stage 3 — Choose Strategies per Object Type

Different data shapes call for different caching strategies. Here is the mapping for our system:

  • Feed / timeline: Cache-Aside + Write-Through on fan-out. When a user publishes a post, the write path fans out the post ID to each follower's cached feed list in Redis (write-through). On read, if the feed key is missing (cold start or expired), load from DB and re-populate (cache-aside / lazy load).
  • Post detail: Cache-Aside with long TTL. First read populates the cache. Subsequent reads hit cache. On edit, explicitly delete the cache key (invalidate-on-write). On delete, same.
  • User profile: Cache-Aside with short TTL. Short TTL (30 s) means stale data self-heals quickly. Explicit invalidation only for critical fields like account suspension.
  • Trending topics: Write-Around / Background Refresh. A cron job recomputes the list and writes directly to cache every 5 minutes, bypassing the normal read path. Clients always read from cache; there is never a miss for this key after warm-up.
  • Static assets: CDN cache with long TTL + cache-busting by filename hash. New deploy = new filename = cache miss once globally; old URL continues serving the old file from CDN edge until it expires naturally.

Stage 4 — Design the Full Topology

With the strategy per object type decided, we can now draw the full architecture. The system has three distinct caching tiers working in concert:

Full caching topology for a read-heavy social content platform Browser HTTP cache CDN Edge static + pages PoPs worldwide assets / pages Load Balancer L7 routing CDN miss App Server 1 local mem cache App Server 2 local mem cache Redis Cluster feeds · profiles trending · sessions cache read/write Primary DB writes + cache miss cache miss writes Read Replica cache-miss reads replication Background Worker (cron/queue) warm trending Legend cache read/write cache miss → DB request routing background warm
Complete caching topology: browser HTTP cache → CDN edge (static + cached pages) → app-server local memory cache → Redis cluster → read replica → primary DB. Background workers pre-warm time-sensitive keys.

The three active caching tiers are:

  • Tier 1 — CDN Edge: Serves static assets (JS, CSS, images) and cacheable full pages (e.g., public post pages with Cache-Control: s-maxage=60). Latency: <20 ms anywhere on earth. Hit ratio for static: ~99%. Hit ratio for pages: ~70–80%.
  • Tier 2 — Application-layer local memory cache (L1): Each app server holds a small in-process cache (e.g., 256 MB using a library like Caffeine in Java or lru-cache in Node). This avoids even the Redis network hop for extremely hot keys (e.g., global config, feature flags, top trending topic). TTL: 5–30 s. Data is per-process, so consistency is relaxed — only use for data where brief inter-node divergence is acceptable.
  • Tier 3 — Redis Cluster (L2): The central distributed cache shared by all app servers. Holds user feeds, post objects, user profiles, session tokens, and rate-limit counters. Latency: ~0.5–2 ms. Hit ratio target: >90%.

Stage 5 — Invalidation Design

The topology tells us where things live; invalidation design tells us when and how to make them stale. Here is the per-object invalidation plan:

Invalidation flow for write operations — post publish and profile update Invalidation: What Happens When a User Publishes a Post User publishes POST /posts App Server validate + write DB Primary DB row inserted Message Queue post.published event publish event Fanout Worker reads follower list Redis Cluster invalidate / update feeds DEL feed:* CDN Purge API purge cached pages purge /feed/* Result after invalidation Next feed request: Redis miss → DB read → repopulate cache Subsequent feed requests: Redis hit (<1 ms). Stale window: <2 s.
Invalidation flow on post publish: the app writes to DB, emits an event to a message queue, and a fanout worker deletes stale Redis feed keys and purges CDN pages — ensuring consistency within 2 seconds.

The key design decisions in this invalidation scheme:

  • Event-driven invalidation via a message queue decouples the write path from cache management. The HTTP request that handles the POST does not block waiting for Redis DEL operations across thousands of follower keys — the fanout happens asynchronously.
  • Delete, do not update, on fanout. For user timelines with thousands of followers, pushing the new post into each feed in Redis synchronously would take seconds. Deleting the feed key is instantaneous. The next read re-populates the feed lazily.
  • CDN purge for public pages. A cached post detail page at the CDN edge must be purged when the post is edited or deleted. Major CDN providers (Cloudflare, Fastly, Akamai) expose a purge API for exactly this purpose. Budget for the extra latency of a CDN purge call (~50–200 ms) in your write path.

Stage 6 — Verify with Numbers

A caching design is only credible if you can show the expected impact. Let us run the numbers for our platform:

  • Before: Peak 2,500 req/s, average 5 DB queries per request = 12,500 queries/s hitting the DB. With a single Primary DB capped at ~8,000 qps, the system is already over capacity.
  • After (CDN hit rate 70% for dynamic pages): 2,500 × 0.30 = 750 req/s reach the app servers.
  • After (Redis hit rate 92% across all object types): 750 × 0.08 = 60 req/s reach the DB for cache misses, plus ~125 req/s for writes = ~185 qps total. The DB is now at <3% of its capacity.
  • Response time: CDN hit: ~15 ms. Redis hit: ~3 ms. Cache miss: ~45 ms. Weighted average P50: ~5 ms (down from ~140 ms without cache).
Realistic Hit Ratios: Achieving 92% Redis hit ratio requires that your top 1,000 most-requested objects fit comfortably in your Redis memory budget. With 1 KB average object size, 1,000 hot objects = 1 MB — trivial. The challenge is ensuring key expiry and invalidation are correct, not memory size.
Do not skip the monitoring step. After deploying your caching layer, instrument Redis with INFO stats (keyspace_hits / keyspace_misses) and alert if hit rate drops below 85%. A sudden drop indicates a cache stampede, a key pattern change, or a misconfigured TTL that is invalidating too aggressively.

Summary: The Complete Cache Design Process

Adding a caching layer to a read-heavy system is not a single decision — it is a six-step engineering process:

  1. Profile the workload — measure read/write ratio, top queries, and latency per endpoint.
  2. Filter cache candidates — high read ratio, expensive to produce, tolerates brief staleness.
  3. Assign a strategy per object type — cache-aside, write-through, or background refresh depending on access patterns and consistency requirements.
  4. Design the tiered topology — browser → CDN → L1 local → L2 distributed cache → DB. Each tier reduces load on the next.
  5. Design invalidation explicitly — event-driven is better than TTL-only for mutable data; decoupled fanout for high-fan-out writes.
  6. Verify with numbers and monitor in production — hit ratio, P50/P99 latency, DB qps before and after.

The result for our social content platform: a system that was saturating its database at peak load now runs at under 3% DB capacity with a weighted P50 response time of 5 ms — a 28× improvement — while maintaining strong eventual consistency within a 2-second stale window for feeds and 30 seconds for profiles.

Tutorial Complete!

Congratulations! You have completed all lessons in this tutorial.