Scaling & Load Balancing

Vertical vs Horizontal Scaling

18 min Lesson 1 of 10

Vertical vs Horizontal Scaling

Every system that succeeds will eventually hit a wall. Users multiply, traffic spikes, and the server that handled a few hundred requests per second starts to buckle under thousands. At that moment, every engineer faces the same fork in the road: scale up (vertical scaling) or scale out (horizontal scaling). Understanding the difference — and the trade-offs — is one of the most important decisions in system design.

Vertical Scaling (Scale Up)

Vertical scaling means making a single machine bigger and more powerful. You upgrade the CPU from 4 cores to 32 cores, double the RAM from 32 GB to 256 GB, swap the spinning disk for a high-speed NVMe SSD, or move to a faster network interface. The application keeps running on one node; you just give that node more muscle.

Real-world example: An early-stage startup runs its entire stack — web server, application, and database — on a single AWS t3.medium instance (2 vCPU, 4 GB RAM). As the user base grows to ~50k daily active users, they upgrade to r6i.8xlarge (32 vCPU, 256 GB RAM). No code changes, no new infrastructure — just a larger box.

  • Pros: Simple — no changes to application code, no distributed systems complexity, no inter-node communication overhead.
  • Pros: Low latency — all work stays in-process on one machine.
  • Cons: Hard ceiling — the biggest single machine AWS offers today is ~448 vCPU and 24 TB RAM. Real systems routinely exceed that.
  • Cons: Single point of failure — if the one machine goes down, the entire service is offline.
  • Cons: Expensive non-linearity — doubling CPU often costs 4× or more at the high end; price-per-core rises sharply.
  • Cons: Downtime during upgrades — resizing a VM or swapping hardware usually requires a restart.
The Ceiling Problem: Google serves billions of searches per day. There is no single computer on earth that could handle that load. Vertical scaling is a useful short-term tactic, but it is structurally incapable of reaching internet scale.

Horizontal Scaling (Scale Out)

Horizontal scaling means adding more machines to share the load. Instead of one powerful server, you run ten (or a hundred, or ten thousand) commodity servers and distribute work across all of them. Each individual machine is modest; the fleet as a whole handles enormous throughput.

Real-world example: Netflix runs on tens of thousands of AWS EC2 instances. No single instance is especially large — many are m5.xlarge or similar. The scale comes from the sheer number of nodes and the infrastructure that routes and balances load across them.

  • Pros: Theoretically unlimited — add more nodes as demand grows.
  • Pros: High availability — if one node fails, the others absorb its traffic; no single point of failure.
  • Pros: Cost efficiency at scale — commodity hardware or burstable cloud instances are far cheaper per unit of compute than top-end single machines.
  • Cons: Complexity — the application must be designed to run correctly on many nodes simultaneously (statelessness, shared state, distributed consistency).
  • Cons: Network overhead — nodes communicate over the network, which introduces latency and potential for partial failures.
  • Cons: Operational cost — you need load balancers, service discovery, health monitoring, and orchestration (Kubernetes, etc.).
Start vertical, plan horizontal: For most products in the early stages, vertical scaling is the pragmatic choice — it is faster, simpler, and requires no architectural change. The goal is to design your application to be stateless from day one so that horizontal scaling becomes a config change (add more instances) rather than a rewrite.

Side-by-Side Diagram

Vertical vs Horizontal Scaling — Side-by-Side Vertical Scaling (Scale Up) Server 4 vCPU / 8 GB BEFORE upgrade Server 32 vCPU / 256 GB (same machine, bigger) AFTER ⚠ Single Point of Failure Hard Ceiling (biggest machine) Horizontal Scaling (Scale Out) Load Balancer distributes requests Server 1 4 vCPU Server 2 4 vCPU Server 3 4 vCPU + Add Server 4, 5… scale indefinitely High Availability — node failure is tolerated
Vertical scaling grows a single machine; horizontal scaling multiplies nodes behind a load balancer.

Where Things Get Tricky: State

The biggest challenge with horizontal scaling is state. Consider a simple shopping cart stored in the server's local memory. With one server, every request hits the same in-memory store — it just works. With three servers behind a load balancer, the user's second request might land on a different server that has no knowledge of the cart. The state is lost.

This is why horizontal scaling and statelessness are deeply linked. Application servers must be made stateless — they should hold no session or user data in local memory. All shared state must live in an external system: a database, a distributed cache like Redis, or a session store. Each server can then handle any request independently.

Sticky Sessions are a Band-Aid: Some load balancers offer "sticky sessions" — routing a given user to the same backend server every time. This appears to solve the state problem without changing the application. In practice it creates hot-spots (one server gets much more load), breaks when that server fails, and makes deployments harder. Treat sticky sessions as a temporary workaround, not a design.

The Numbers: When Does Each Strategy Make Sense?

There are no universal thresholds, but these rough guidelines hold across many production systems:

  • 0 – 10k requests/day: A single small instance is fine. Vertical scaling to a mid-tier machine is the fastest path.
  • 10k – 1M requests/day: One large vertical instance often works, but start eliminating local state now to be ready.
  • 1M – 100M requests/day: Horizontal scaling of stateless application servers, a dedicated database tier, possibly read replicas and caching.
  • 100M+ requests/day: Full horizontal distribution at every tier — load balancers, application servers, caches, sharded databases, CDN at the edge.

In Practice: Most Systems Use Both

Real production architectures do not choose one strategy and abandon the other. The typical pattern is:

  1. Vertically scale the database as far as it can go (databases are hard to distribute without significant trade-offs).
  2. Horizontally scale the stateless application tier (web servers, API servers) — these are easy to replicate.
  3. Vertically scale individual horizontal nodes when more per-node power is needed, while also adding more nodes.
Hybrid Scaling Architecture Clients Load Balancer horizontal (active-active) App Server 1 stateless · scale out App Server 2 stateless · scale out App Server 3 stateless · scale out Primary Database scale up (larger instance)
Hybrid approach: stateless application servers scale out; the database scales up.

Key Takeaways

  • Vertical scaling is simple but has a hard ceiling and a single point of failure.
  • Horizontal scaling is theoretically unlimited but requires stateless services and added infrastructure.
  • The two strategies are complementary — real systems use both at different tiers.
  • Making application servers stateless from day one is the single most important preparation for horizontal scale.
Coming up next: Lesson 2 digs into exactly what stateless vs stateful services mean in practice — the design patterns, the pitfalls, and how to move state out of your application servers.