System Design Fundamentals

The Building Blocks of a System

18 min Lesson 8 of 10

The Building Blocks of a System

Every large-scale system — whether it serves ten million users or processes billions of events per day — is assembled from a small, well-understood set of components. Knowing what each component does, why you would reach for it, and what trade-offs it carries is the difference between a system that holds up under pressure and one that collapses. This lesson gives you that toolbox.

The Canonical Five

Before we dive into each component, look at how they fit together in a typical read-heavy web application:

How the five core building blocks fit together in a typical read-heavy web application.

1. Load Balancer

A load balancer sits in front of a pool of servers and distributes incoming requests so that no single server becomes a bottleneck. It also serves as the single public entry point, hiding internal topology from clients.

Layer 4 (transport): Routes by IP/port — extremely fast, but cannot inspect request content.
Layer 7 (application): Routes by URL, header, or cookie — can do path-based routing (e.g., /api/* → service A, /static/* → service B) and SSL termination.
Algorithms: round-robin, least-connections, IP-hash (sticky sessions), weighted.

Health checks matter. A load balancer continuously probes each backend. When a server fails its health check, the load balancer stops sending it traffic — no manual intervention required. This is the foundation of high availability.

Real examples: AWS ALB (L7), AWS NLB (L4), Nginx, HAProxy, Cloudflare.

2. Cache

A cache is a fast, in-memory data store placed between your application and a slower data source (typically a database). When a read-heavy endpoint fetches the same data thousands of times per second, hitting the database each time is wasteful and unsustainable. A cache absorbs those reads.

Cache-aside (lazy loading): App checks cache first; on a miss, reads from DB and populates the cache. Most common pattern.
Write-through: Writes go to both cache and DB simultaneously. Cache is always warm, but adds write latency.
Write-behind (write-back): Writes go to cache only; DB is updated asynchronously. Very fast writes, but risk of data loss if cache crashes.

Set a TTL (time-to-live) on every cache entry. Without expiry, you serve stale data indefinitely. TTL of 5–60 s is common for volatile data; hours or days for stable reference data (e.g., product catalogue).

Key metrics: cache hit rate (aim for >90% on hot paths), eviction policy (LRU is the safe default). Real examples: Redis, Memcached, Varnish (HTTP cache).

Cache stampede (thundering herd): When a popular cache entry expires, hundreds of concurrent requests all miss simultaneously and slam the database. Mitigate with probabilistic early expiry, a mutex lock on the first re-computation, or a distributed lock (Redis SETNX).

3. Database

The database is the system of record — the authoritative, durable store. Its design decisions (relational vs. document vs. columnar, primary vs. replica, sharding vs. federation) shape almost everything else.

Relational (SQL): Strong ACID guarantees, rich query language, best for structured data with complex relationships. PostgreSQL, MySQL, Aurora.
Document (NoSQL): Flexible schema, horizontal scalability, good for hierarchical/JSON data. MongoDB, DynamoDB, Firestore.
Read replicas: A standby copy of the primary that serves reads. Offloads 80-95% of traffic in typical read-heavy apps, with a small replication lag (usually <100 ms).
Sharding: Horizontally partition data across multiple database nodes (e.g., users A–M on shard 1, N–Z on shard 2). Massively increases write throughput, but adds significant operational complexity.

Don't prematurely shard. Most systems that sharded early regretted it. Start with a primary + read replica. Shard only when you have clear evidence that a single node cannot keep up with write volume.

4. Message Queue

A message queue decouples the component that produces a unit of work from the component that processes it. The producer sends a message and returns immediately; one or more consumers process it asynchronously in the background.

This is valuable whenever you have:

Bursty workloads: A sudden spike of 10,000 image-resize jobs does not overwhelm the resizing service — the queue absorbs the burst and workers drain it at their own pace.
Long-running tasks: Email sending, video encoding, PDF generation — anything too slow to complete synchronously in an HTTP request.
Reliability: If the consumer crashes, the message is not lost — it stays in the queue and is retried.

Design for idempotency. A message may be delivered more than once (at-least-once semantics is the default in most queues). Your consumer must handle duplicate delivery safely — for example, by checking whether the work was already done before doing it again.

Real examples: Apache Kafka (event streaming, ordered log), Amazon SQS (simple reliable queues), RabbitMQ (routing / pub-sub), Celery (task queue for Python/PHP).

5. CDN (Content Delivery Network)

A CDN is a globally distributed network of edge servers (points of presence, or PoPs) that cache and serve content from a location geographically close to the end user. Instead of every user fetching a 1 MB JavaScript bundle from your origin server in Virginia, a user in Tokyo gets it from a PoP 20 ms away.

Static asset delivery: Images, CSS, JS, fonts — these are the primary use case. Cache-Control headers tell the CDN how long to hold a file.
Dynamic content acceleration: Modern CDNs (Cloudflare, Fastly) can cache API responses with short TTLs, or route dynamic requests over optimised backbone networks.
DDoS protection: CDN edge nodes absorb volumetric attacks before traffic ever reaches your origin.

CDN edge nodes serve cached assets from nearby PoPs. Only on a cache miss does a request travel back to the origin server.

Real examples: Cloudflare, AWS CloudFront, Fastly, Akamai.

Putting It Together: the Decision Heuristic

When you are designing a system and you reach a bottleneck, ask:

Too many requests for one server? → Add a load balancer and scale horizontally.
Too many repeated reads hitting the database? → Add a cache in front of the DB.
Writes are slow or processing is long? → Move the work to a message queue and process it asynchronously.
Database is the bottleneck? → Add read replicas, then consider sharding if writes are the problem.
Static assets are slow for distant users? → Put them on a CDN.

Real systems layer these components. A production service at scale often uses all five at the same time. Understanding each one — its purpose, its failure modes, and its trade-offs — lets you compose them correctly rather than blindly stacking them.