Scaling & Load Balancing

Scaling Reads with Replicas

18 min Lesson 7 of 10

Scaling Reads with Replicas

Most production web applications are read-heavy. A typical e-commerce site might execute 10 product-page reads for every 1 order write. A social feed might serve 100 timeline reads for every 1 new post. When a single database node can no longer keep up with that read volume, read replicas are the first line of defence — and often the most cost-effective one.

The Single-Node Bottleneck

Every relational database server has a ceiling: CPU cores, RAM for the buffer pool, disk I/O bandwidth, and network throughput. When read queries saturate those resources, every query slows down — including writes. You can add indexes, tune queries, and upgrade hardware, but eventually you hit a wall. The next step is to distribute the read load across multiple database instances.

Rule of thumb: A well-tuned single MySQL or PostgreSQL node on modern hardware can handle roughly 5,000–15,000 simple read queries per second. Beyond that, you need replicas — or a fundamentally different architecture.

How Replication Works

Database replication is the process of continuously copying data from one node — the primary (also called leader, master, or source) — to one or more replicas (also called followers, standbys, or read replicas). The primary accepts both reads and writes. Replicas accept reads only.

The replication mechanism differs by database engine but the concept is the same everywhere:

  1. A write arrives at the primary. The primary commits the change to its own storage and records it in a replication log (called the binlog in MySQL, the WAL — Write-Ahead Log — in PostgreSQL).
  2. Each replica maintains a connection to the primary and streams that log continuously.
  3. The replica applies the same operations to its own copy of the data, staying in sync.
  4. Reads directed at a replica are served entirely from the replica's local storage — the primary is not involved at all.
Primary-replica replication topology with read/write split Application Writes Reads Primary DB Reads + Writes Read LB / Proxy Round-robin to replicas Replica 1 Reads only Replica 2 Reads only Replica 3 Reads only WAL / binlog
Read/write split: the application sends writes to the primary and distributes reads across replicas through a proxy or read load balancer.

Synchronous vs. Asynchronous Replication

How quickly a replica catches up with the primary is governed by the replication mode:

  • Asynchronous (default in most systems): The primary commits a write and acknowledges the client immediately, without waiting for any replica to apply the change. Replicas catch up in the background — typically within milliseconds on a LAN, but potentially seconds or minutes under heavy load or a slow network. This is the highest-throughput option but introduces replication lag.
  • Synchronous (semi-sync in MySQL, synchronous standby in PostgreSQL): The primary waits for at least one replica to confirm it received the write before acknowledging the client. This eliminates data loss on primary failure but adds write latency equal to at least one network round trip.
  • Semi-synchronous: A practical middle ground — one replica must confirm, the rest are async. Used heavily in high-availability MySQL setups.
Replication lag is a consistency hazard. If you write a record and immediately read it back from a replica that hasn't caught up yet, you will get stale data — or worse, a not found result. This is called read-your-writes inconsistency. Always route reads that must be fresh (e.g., immediately after a user submits a form) back to the primary.

Implementing Read/Write Split

Most modern stacks provide at least one of these mechanisms to separate read and write connections:

  • Application-level routing: Your ORM or database library supports multiple connection strings. Writes use a write connection; reads use a read connection pool. Laravel's read/write connections config is a standard example — you list one or more read hosts and a write host; the framework routes automatically.
  • Proxy-level routing: A middleware proxy such as ProxySQL (MySQL) or PgBouncer + pgpool-II (PostgreSQL) sits between your app and the database. It inspects SQL statements and routes SELECT statements to replicas and INSERT/UPDATE/DELETE/BEGIN to the primary. The application sees a single connection string.
  • Managed cloud replicas: Amazon RDS, Google Cloud SQL, and Azure Database all offer read replicas as a one-click feature. The cloud console gives you a separate read endpoint hostname. AWS Aurora goes further, providing a single reader endpoint that load-balances across all replicas automatically.
Best practice: Use a proxy (ProxySQL, RDS Proxy) in production rather than application-level routing. The proxy handles failover — if the primary fails and a replica is promoted, the proxy updates its routing table and your application code does not change at all.

How Many Replicas Do You Need?

Start with numbers. Suppose your primary handles 8,000 read queries per second (QPS) and you want to stay under 60% CPU utilisation for headroom. At 60% your primary is comfortable at roughly 5,000 QPS. If each replica has the same hardware capacity:

  • Total read load: 8,000 QPS
  • Capacity per node at 60%: 5,000 QPS
  • Replicas needed: ⌈8,000 / 5,000⌉ = 2 replicas, leaving the primary for writes only

In practice, add at least one extra replica as a hot standby and to absorb traffic spikes. Large platforms like GitHub and Shopify run dozens of replicas per shard — not just for read throughput, but also for regional proximity (putting a replica in the EU to serve European users with lower latency) and for analytical workloads (running heavy GROUP BY reports on a replica so they never touch the primary).

Capacity planning: read load distribution across primary and replicas No Replicas With 2 Read Replicas Primary DB CPU: 90% ⚠ 8,000 QPS (reads) + all writes Slow queries, timeouts, outage risk Primary DB CPU: 25% Writes only Replica 1 CPU: 48% 4,000 QPS Replica 2 CPU: 48% 4,000 QPS Reads split evenly, primary free for writes
Left: a single overloaded primary. Right: the same write load on the primary and 8,000 read QPS split across two replicas — each node stays well under capacity.

Replica Lag: Measuring and Managing It

Replication lag is the delay between a write being committed on the primary and being visible on a replica. You can observe it directly:

-- MySQL: seconds behind primary SHOW REPLICA STATUS\G -- Look for: Seconds_Behind_Source: 0 -- PostgreSQL: lag in bytes and time SELECT client_addr, write_lag, flush_lag, replay_lag FROM pg_stat_replication;

Strategies to keep lag low:

  • Parallel replication: MySQL 8.0+ and PostgreSQL 10+ can apply the replication log using multiple threads in parallel, dramatically reducing lag under high write rates.
  • Replica sizing: Give replicas the same (or more) RAM as the primary so the buffer pool is equally warm.
  • Avoid heavy analytics queries on replicas under heavy lag — a long-running SELECT can hold locks that stall replication.
  • Monitor lag as a SLO metric — alert at > 5 seconds, page at > 30 seconds.

Failover: When the Primary Dies

A replica is not just a read-scaling tool — it is your recovery path when the primary fails. In a high-availability setup, one replica is designated the hot standby. When the primary goes down, an automatic failover process promotes the standby to primary within seconds, updates the DNS or proxy routing table, and the rest of the replicas re-connect to the new primary.

Tools that automate this: Orchestrator (MySQL), Patroni (PostgreSQL), Amazon RDS Multi-AZ (fully managed, ~60-second automatic failover). Cloud-managed replicas with automatic failover are now the standard for any production system that cannot tolerate multi-minute downtime.

Replicas are not backups. A replica will faithfully apply a DROP TABLE or a corrupted row written to the primary. Always maintain a separate periodic backup (e.g., mysqldump, pg_dump, or snapshot-based) in addition to your replicas.

Trade-offs at a Glance

  • Benefit — read throughput: Linear horizontal scaling of reads. Add another replica, add another 5,000–15,000 QPS of read capacity.
  • Benefit — geographic distribution: A replica in each region serves local users with sub-10 ms database latency instead of cross-ocean trips.
  • Benefit — HA: Replicas double as standbys, turning hardware failures into seconds of downtime instead of hours.
  • Cost — write scalability: Replication does not help writes. Every write still goes to one primary. For write-heavy workloads you need sharding (Lesson 8).
  • Cost — consistency: Async replication means replicas may lag. Applications must be designed to tolerate eventual consistency on read paths, or route sensitive reads to the primary.
  • Cost — operational complexity: More nodes mean more monitoring, more failover logic, more connection management. Use a proxy or managed service to absorb that complexity.
Start simple: One primary + one read replica handles 90% of read-scaling needs for early-stage products. You do not need a fleet of replicas until your single-node query throughput genuinely becomes the bottleneck — and that moment is measurable with SHOW STATUS LIKE 'Threads_running' or equivalent.