Where to Cache: The Layers
Where to Cache: The Layers
Caching is not a single switch you flip — it is a stack of decisions. The same piece of data can be cached in four distinct places as it travels from a database on a server in Virginia to a browser in Tokyo. Each layer has a different cost, a different scope, and a different set of trade-offs. Understanding where to cache is as important as knowing what to cache.
Layer 1 — Client Cache
The fastest cache is the one that never leaves the user's device. Browsers store HTTP responses in a local disk/memory cache governed entirely by response headers you send from the server:
Cache-Control: max-age=86400— the browser may serve this response for up to 24 hours without going to the network at all.ETag/Last-Modified— conditional revalidation: the browser asks "has this changed?" and only downloads the body if the server says yes (a304 Not Modifiedsaves bandwidth even on a stale resource).
Client caching is private per user. A CSS file cached in one user's browser does nothing for another user's first visit. But for returning visitors it eliminates network round-trips entirely — latency literally becomes zero for a cache hit. Google reports that roughly 85 % of web resources are cacheable, yet many sites send Cache-Control: no-store on everything by default, leaving enormous performance on the table.
app.a3f91c.js (content hash in the filename). Then serve them with Cache-Control: max-age=31536000, immutable. When the file changes, the URL changes — so the browser fetches the new version immediately, while the old one is cached indefinitely for users who already have it.
Layer 2 — CDN / Edge Cache
A Content Delivery Network operates hundreds of Points of Presence (PoPs) around the globe. When a user in Singapore requests your site hosted in Frankfurt, the CDN serves the cached response from a PoP in Singapore — cutting latency from ~200 ms to ~5 ms.
CDNs are shared caches: every user who hits the same PoP benefits from an earlier visitor's warm cache. They are ideal for assets and pages that are identical for all users: images, fonts, JavaScript bundles, HTML for non-personalized pages, and API responses that can be public.
Major CDNs — Cloudflare, AWS CloudFront, Fastly — also run logic at the edge (edge functions, edge workers) so you can do lightweight personalisation or A/B testing without a round-trip to your origin.
Cache-Control directives, but CDNs also respect s-maxage (shared cache TTL) which lets you set a shorter TTL for browsers while keeping a longer one at the CDN.
Layer 3 — Application-Level Cache
Once a request reaches your servers, you have two sub-options:
- In-process (local memory) cache — a hash map inside the running process. Libraries like Caffeine (Java), Guava Cache (Java), or
functools.lru_cache(Python) store key-value pairs in the application's own heap. A hit costs a few hundred nanoseconds — essentially free. The trade-off: each app server has its own independent cache. On a 10-server fleet, the same data may be cached 10 times, and invalidation is difficult across servers. - Reverse-proxy cache — tools like Varnish or Nginx can sit in front of your application and cache entire HTTP responses. The application itself never runs for a cache hit.
Local in-process caches are best for immutable or rarely-changing reference data — country codes, feature flags, configuration objects — where the cost of staleness is low and the read frequency is very high.
Layer 4 — Distributed Cache (Redis / Memcached)
A dedicated caching tier — typically Redis or Memcached — sits between your application and the database. Unlike in-process caches, all application servers share the same cache cluster. A key written by any server is instantly readable by every other server.
This is the workhorse of production caching:
- A cache hit costs ~0.5–2 ms (network call) instead of 5–50 ms for a database query.
- At Twitter's scale (2013 data), a single trending tweet's metadata is fetched tens of thousands of times per second — a database could never serve that; Memcached can.
- Redis adds rich data structures (sorted sets, streams, pub/sub) that let it do far more than pure caching — rate limiting, session storage, leaderboards.
Layer 5 — Database-Level Cache
The database engine itself has caches you can tune without changing any application code. MySQL's InnoDB Buffer Pool is a region of RAM that holds recently accessed pages (rows + index pages). When a query page is already in the buffer pool, no disk I/O occurs — the engine returns results from RAM. Sizing this correctly (typically 70–80 % of available RAM on a dedicated DB server) is one of the highest-leverage database performance improvements possible.
Some databases also offer a query result cache (MySQL's now-removed query cache, PostgreSQL's plan cache). These are largely deprecated in modern engines because they introduced lock contention problems — the explicit caching layers above give you more control.
Choosing the Right Layer(s)
In practice you use multiple layers simultaneously. A request for a product image might: hit the browser cache (0 ms) → miss → hit the CDN PoP (4 ms) → miss → hit Redis (1 ms) → miss → hit the DB buffer pool (2 ms) → miss → read from disk (10 ms). The goal is for the vast majority of requests to be stopped at one of the earlier, cheaper layers.
In the next lesson we will zoom in on how data gets into and out of each cache — the read-through, write-through, and write-behind cache strategies.