Vertical vs Horizontal Scaling
Vertical vs Horizontal Scaling
Every system that succeeds will eventually hit a wall. Users multiply, traffic spikes, and the server that handled a few hundred requests per second starts to buckle under thousands. At that moment, every engineer faces the same fork in the road: scale up (vertical scaling) or scale out (horizontal scaling). Understanding the difference — and the trade-offs — is one of the most important decisions in system design.
Vertical Scaling (Scale Up)
Vertical scaling means making a single machine bigger and more powerful. You upgrade the CPU from 4 cores to 32 cores, double the RAM from 32 GB to 256 GB, swap the spinning disk for a high-speed NVMe SSD, or move to a faster network interface. The application keeps running on one node; you just give that node more muscle.
Real-world example: An early-stage startup runs its entire stack — web server, application, and database — on a single AWS t3.medium instance (2 vCPU, 4 GB RAM). As the user base grows to ~50k daily active users, they upgrade to r6i.8xlarge (32 vCPU, 256 GB RAM). No code changes, no new infrastructure — just a larger box.
- Pros: Simple — no changes to application code, no distributed systems complexity, no inter-node communication overhead.
- Pros: Low latency — all work stays in-process on one machine.
- Cons: Hard ceiling — the biggest single machine AWS offers today is ~448 vCPU and 24 TB RAM. Real systems routinely exceed that.
- Cons: Single point of failure — if the one machine goes down, the entire service is offline.
- Cons: Expensive non-linearity — doubling CPU often costs 4× or more at the high end; price-per-core rises sharply.
- Cons: Downtime during upgrades — resizing a VM or swapping hardware usually requires a restart.
Horizontal Scaling (Scale Out)
Horizontal scaling means adding more machines to share the load. Instead of one powerful server, you run ten (or a hundred, or ten thousand) commodity servers and distribute work across all of them. Each individual machine is modest; the fleet as a whole handles enormous throughput.
Real-world example: Netflix runs on tens of thousands of AWS EC2 instances. No single instance is especially large — many are m5.xlarge or similar. The scale comes from the sheer number of nodes and the infrastructure that routes and balances load across them.
- Pros: Theoretically unlimited — add more nodes as demand grows.
- Pros: High availability — if one node fails, the others absorb its traffic; no single point of failure.
- Pros: Cost efficiency at scale — commodity hardware or burstable cloud instances are far cheaper per unit of compute than top-end single machines.
- Cons: Complexity — the application must be designed to run correctly on many nodes simultaneously (statelessness, shared state, distributed consistency).
- Cons: Network overhead — nodes communicate over the network, which introduces latency and potential for partial failures.
- Cons: Operational cost — you need load balancers, service discovery, health monitoring, and orchestration (Kubernetes, etc.).
Side-by-Side Diagram
Where Things Get Tricky: State
The biggest challenge with horizontal scaling is state. Consider a simple shopping cart stored in the server's local memory. With one server, every request hits the same in-memory store — it just works. With three servers behind a load balancer, the user's second request might land on a different server that has no knowledge of the cart. The state is lost.
This is why horizontal scaling and statelessness are deeply linked. Application servers must be made stateless — they should hold no session or user data in local memory. All shared state must live in an external system: a database, a distributed cache like Redis, or a session store. Each server can then handle any request independently.
The Numbers: When Does Each Strategy Make Sense?
There are no universal thresholds, but these rough guidelines hold across many production systems:
- 0 – 10k requests/day: A single small instance is fine. Vertical scaling to a mid-tier machine is the fastest path.
- 10k – 1M requests/day: One large vertical instance often works, but start eliminating local state now to be ready.
- 1M – 100M requests/day: Horizontal scaling of stateless application servers, a dedicated database tier, possibly read replicas and caching.
- 100M+ requests/day: Full horizontal distribution at every tier — load balancers, application servers, caches, sharded databases, CDN at the edge.
In Practice: Most Systems Use Both
Real production architectures do not choose one strategy and abandon the other. The typical pattern is:
- Vertically scale the database as far as it can go (databases are hard to distribute without significant trade-offs).
- Horizontally scale the stateless application tier (web servers, API servers) — these are easy to replicate.
- Vertically scale individual horizontal nodes when more per-node power is needed, while also adding more nodes.
Key Takeaways
- Vertical scaling is simple but has a hard ceiling and a single point of failure.
- Horizontal scaling is theoretically unlimited but requires stateless services and added infrastructure.
- The two strategies are complementary — real systems use both at different tiers.
- Making application servers stateless from day one is the single most important preparation for horizontal scale.