Scaling Databases
Scaling Databases
At some point every database hits a wall. Queries slow down, CPU pegs at 100%, disk I/O saturates, or the primary simply cannot absorb another thousand writes per second. The correct response depends entirely on which wall you hit. Throwing money at the wrong solution is one of the most common and expensive mistakes in production database operations — and scaling a database the wrong way can create years of technical debt that is far harder to unwind than the original performance problem.
This lesson walks through the three tiers of the scaling playbook in the order you should actually reach for them: vertical scaling first, read scaling second, sharding only when you genuinely have no other choice.
Tier 1 — Vertical Scaling: The Fastest Lever You Have
Vertical scaling means moving to a larger host: more CPU cores, more RAM, faster NVMe disks. It is the right first move because it requires zero application changes, zero data migration, and often resolves the problem immediately. Its limits are real but frequently reached later than engineers expect.
What vertical scaling actually buys you:
- RAM: The single most impactful resource for most OLTP workloads. PostgreSQL's
shared_buffersand MySQL'sinnodb_buffer_pool_sizecache hot data pages. When your working set fits in RAM, disk reads drop to near zero. A database on a 128 GB host often performs 5–10× better than on a 32 GB host running the same queries. - CPU: Matters for complex analytical queries, connection handling, and sort/hash operations. Doubling CPU cores rarely doubles throughput for write-heavy OLTP — the bottleneck is usually I/O or lock contention, not compute.
- Disk: Switching from spinning HDD to NVMe SSD is often a 10–50× improvement in random read/write latency. If your host is still on EBS gp2 (AWS), upgrading to gp3 or io2 with provisioned IOPS is a no-downtime change that frequently eliminates I/O wait entirely.
Innodb_buffer_pool_reads climbing faster than Innodb_buffer_pool_read_requests, you are disk-bound, and more RAM will fix it before any architectural change. Run this on PostgreSQL to see it instantly:
The real ceiling of vertical scaling is not the instance size — it is the cost curve and single-node SPOF risk. A 192-core, 24 TB RAM machine exists in major clouds, but costs $40k+/month. More practically, a single node means a single failure domain. The combination of cost ceiling and HA requirements is what genuinely forces the move to the next tier.
Tier 2 — Read Scaling: Adding Read Replicas
If your workload is read-heavy (a ratio of 5:1 or higher reads to writes is typical for web applications), and your primary is CPU-bound despite adequate RAM and fast disks, read replicas are the correct next step. The primary handles all writes; replicas serve reads using streaming replication (PostgreSQL) or binary log replication (MySQL). A load balancer or smart connection string routes read traffic to the replica pool.
Production-grade patterns differ from the toy examples in tutorials:
- Replica lag is a real hazard. Under heavy write load, replicas can fall behind the primary by seconds or more. Reading stale data after a write is a class of bug — called a read-your-own-writes inconsistency — that is extremely hard to debug in production. Route session-critical reads (profile settings, payment status) to the primary, and only route analytics, dashboards, and search to replicas.
- Use separate connection pools for each endpoint. PgBouncer or ProxySQL should pool connections to the primary separately from connections to the replica pool. Never mix them in a single pool.
- Monitor replica lag as a first-class SLO. Alert when lag exceeds your tolerance (typically 1–5 seconds for web apps, milliseconds for financial data). In PostgreSQL, this is
pg_stat_replication; in MySQL it is theSeconds_Behind_Sourcefield inSHOW REPLICA STATUS. - Read replicas do not reduce primary write load. If your bottleneck is write throughput, replicas solve nothing — you are at the sharding decision point.
Tier 3 — Sharding: When You Actually Need It
Sharding — horizontally partitioning data across multiple independent database nodes, each owning a distinct subset of rows — is the nuclear option. It solves write saturation and storage limits that no single host can address. It also introduces a class of complexity that has ended engineering careers and caused production outages that took days to recover from. Before you shard, you need to be honest with yourself about whether you have exhausted every alternative.
Signs you have genuinely reached the sharding threshold:
- Write throughput that exceeds the maximum of the largest available instance (typically >50k writes/sec for MySQL, >100k TPS for PostgreSQL on NVMe)
- Dataset larger than the maximum storage of a single node (hundreds of TB)
- Legal or compliance requirements to partition data by geography or tenant
Alternatives to check before sharding: table partitioning (built into PostgreSQL and MySQL, zero application changes, handles datasets up to tens of TB elegantly), CQRS with a separate read model (Elasticsearch, read-only Redshift), moving time-series data to a purpose-built store (TimescaleDB, InfluxDB), and archiving cold rows to object storage (AWS S3 + Athena).
Sharding Strategies
Range sharding assigns rows to shards by a contiguous range of the shard key (e.g., user IDs 1–10M on shard 1, 10M–20M on shard 2). Simple to reason about; terrible for hotspots — if your newest users are most active, all traffic hits the highest-range shard.
Hash sharding applies a hash function to the shard key and assigns the row to hash(key) % N shards. Even distribution; resharding when you add nodes requires moving data (consistent hashing reduces this but does not eliminate it).
Directory-based sharding maintains an explicit lookup table mapping each key to a shard. Maximum flexibility; the lookup table becomes a single point of failure and a write bottleneck if not carefully managed.
Managed Sharding: What the Platform Does For You
At big-tech scale, sharding is usually handled by a middleware tier rather than in application code directly. The dominant options in 2025:
- Vitess — MySQL sharding layer used by YouTube, Slack, PlanetScale. Handles resharding, cross-shard queries, and connection pooling transparently. Steep operational complexity.
- Citus (PostgreSQL) — native distributed PostgreSQL extension. Distributes tables across worker nodes; transparent to most SQL. Strong fit for multi-tenant SaaS and analytics.
- CockroachDB / YugabyteDB / Spanner — distributed SQL engines that shard internally. You write standard SQL; the engine handles sharding, replication, and failover. Trade: higher latency (distributed consensus) and higher cost than a single-node Postgres.
The Decision Framework: Which Tier, Right Now?
Engineers routinely jump to sharding when the correct answer was more RAM or a read replica. Use this checklist before escalating to the next tier:
- Is the buffer hit rate below 95%? → Add RAM first.
- Is the primary CPU-bound on reads, not writes? → Add read replicas.
- Is the write throughput saturated? → Check table partitioning, CQRS, and archiving before sharding.
- Have you profiled and eliminated slow queries? A missing index fixed in five minutes has resolved incidents that were being diagnosed as capacity problems.
- Only after all of the above → Evaluate sharding with Vitess, Citus, or a distributed SQL engine.