Caching & CDNs

Where to Cache: The Layers

18 min Lesson 2 of 10

Where to Cache: The Layers

Caching is not a single switch you flip — it is a stack of decisions. The same piece of data can be cached in four distinct places as it travels from a database on a server in Virginia to a browser in Tokyo. Each layer has a different cost, a different scope, and a different set of trade-offs. Understanding where to cache is as important as knowing what to cache.

The four main caching layers, from closest to the user (Client) to deepest (Database), with typical hit latency, scope, and common implementations.

Layer 1 — Client Cache

The fastest cache is the one that never leaves the user's device. Browsers store HTTP responses in a local disk/memory cache governed entirely by response headers you send from the server:

Cache-Control: max-age=86400 — the browser may serve this response for up to 24 hours without going to the network at all.
ETag / Last-Modified — conditional revalidation: the browser asks "has this changed?" and only downloads the body if the server says yes (a 304 Not Modified saves bandwidth even on a stale resource).

Client caching is private per user. A CSS file cached in one user's browser does nothing for another user's first visit. But for returning visitors it eliminates network round-trips entirely — latency literally becomes zero for a cache hit. Google reports that roughly 85 % of web resources are cacheable, yet many sites send Cache-Control: no-store on everything by default, leaving enormous performance on the table.

Fingerprint static assets. Name files like app.a3f91c.js (content hash in the filename). Then serve them with Cache-Control: max-age=31536000, immutable. When the file changes, the URL changes — so the browser fetches the new version immediately, while the old one is cached indefinitely for users who already have it.

Layer 2 — CDN / Edge Cache

A Content Delivery Network operates hundreds of Points of Presence (PoPs) around the globe. When a user in Singapore requests your site hosted in Frankfurt, the CDN serves the cached response from a PoP in Singapore — cutting latency from ~200 ms to ~5 ms.

CDNs are shared caches: every user who hits the same PoP benefits from an earlier visitor's warm cache. They are ideal for assets and pages that are identical for all users: images, fonts, JavaScript bundles, HTML for non-personalized pages, and API responses that can be public.

Major CDNs — Cloudflare, AWS CloudFront, Fastly — also run logic at the edge (edge functions, edge workers) so you can do lightweight personalisation or A/B testing without a round-trip to your origin.

CDN vs Browser Cache. A browser cache is private (one user). A CDN cache is shared (many users from one region). Both use the same Cache-Control directives, but CDNs also respect s-maxage (shared cache TTL) which lets you set a shorter TTL for browsers while keeping a longer one at the CDN.

Layer 3 — Application-Level Cache

Once a request reaches your servers, you have two sub-options:

In-process (local memory) cache — a hash map inside the running process. Libraries like Caffeine (Java), Guava Cache (Java), or functools.lru_cache (Python) store key-value pairs in the application's own heap. A hit costs a few hundred nanoseconds — essentially free. The trade-off: each app server has its own independent cache. On a 10-server fleet, the same data may be cached 10 times, and invalidation is difficult across servers.
Reverse-proxy cache — tools like Varnish or Nginx can sit in front of your application and cache entire HTTP responses. The application itself never runs for a cache hit.

Local in-process caches are best for immutable or rarely-changing reference data — country codes, feature flags, configuration objects — where the cost of staleness is low and the read frequency is very high.

In-process cache + horizontal scaling = inconsistency. If user A updates a record and their request lands on Server 1, Server 2 may serve stale cached data to user B until its TTL expires. For mutable data with strong consistency requirements, skip the in-process layer and go straight to a shared cache.

Layer 4 — Distributed Cache (Redis / Memcached)

A dedicated caching tier — typically Redis or Memcached — sits between your application and the database. Unlike in-process caches, all application servers share the same cache cluster. A key written by any server is instantly readable by every other server.

This is the workhorse of production caching:

A cache hit costs ~0.5–2 ms (network call) instead of 5–50 ms for a database query.
At Twitter's scale (2013 data), a single trending tweet's metadata is fetched tens of thousands of times per second — a database could never serve that; Memcached can.
Redis adds rich data structures (sorted sets, streams, pub/sub) that let it do far more than pure caching — rate limiting, session storage, leaderboards.

Layer 5 — Database-Level Cache

The database engine itself has caches you can tune without changing any application code. MySQL's InnoDB Buffer Pool is a region of RAM that holds recently accessed pages (rows + index pages). When a query page is already in the buffer pool, no disk I/O occurs — the engine returns results from RAM. Sizing this correctly (typically 70–80 % of available RAM on a dedicated DB server) is one of the highest-leverage database performance improvements possible.

Some databases also offer a query result cache (MySQL's now-removed query cache, PostgreSQL's plan cache). These are largely deprecated in modern engines because they introduced lock contention problems — the explicit caching layers above give you more control.

Choosing the Right Layer(s)

In practice you use multiple layers simultaneously. A request for a product image might: hit the browser cache (0 ms) → miss → hit the CDN PoP (4 ms) → miss → hit Redis (1 ms) → miss → hit the DB buffer pool (2 ms) → miss → read from disk (10 ms). The goal is for the vast majority of requests to be stopped at one of the earlier, cheaper layers.

A request traverses the cache layers left-to-right until it hits. A hit at any layer returns the response immediately. Every layer crossed is additional latency.

The layers are complementary, not competing. Static assets belong at the client and CDN layers. Computed API responses belong at the Redis layer. Reference data hot-loaded on every request belongs in an in-process cache. Database query plans belong in the DB buffer pool. A mature system uses all of them.

In the next lesson we will zoom in on how data gets into and out of each cache — the read-through, write-through, and write-behind cache strategies.