Scaling Reads with Replicas
Scaling Reads with Replicas
Most production web applications are read-heavy. A typical e-commerce site might execute 10 product-page reads for every 1 order write. A social feed might serve 100 timeline reads for every 1 new post. When a single database node can no longer keep up with that read volume, read replicas are the first line of defence — and often the most cost-effective one.
The Single-Node Bottleneck
Every relational database server has a ceiling: CPU cores, RAM for the buffer pool, disk I/O bandwidth, and network throughput. When read queries saturate those resources, every query slows down — including writes. You can add indexes, tune queries, and upgrade hardware, but eventually you hit a wall. The next step is to distribute the read load across multiple database instances.
How Replication Works
Database replication is the process of continuously copying data from one node — the primary (also called leader, master, or source) — to one or more replicas (also called followers, standbys, or read replicas). The primary accepts both reads and writes. Replicas accept reads only.
The replication mechanism differs by database engine but the concept is the same everywhere:
- A write arrives at the primary. The primary commits the change to its own storage and records it in a replication log (called the binlog in MySQL, the WAL — Write-Ahead Log — in PostgreSQL).
- Each replica maintains a connection to the primary and streams that log continuously.
- The replica applies the same operations to its own copy of the data, staying in sync.
- Reads directed at a replica are served entirely from the replica's local storage — the primary is not involved at all.
Synchronous vs. Asynchronous Replication
How quickly a replica catches up with the primary is governed by the replication mode:
- Asynchronous (default in most systems): The primary commits a write and acknowledges the client immediately, without waiting for any replica to apply the change. Replicas catch up in the background — typically within milliseconds on a LAN, but potentially seconds or minutes under heavy load or a slow network. This is the highest-throughput option but introduces replication lag.
- Synchronous (semi-sync in MySQL, synchronous standby in PostgreSQL): The primary waits for at least one replica to confirm it received the write before acknowledging the client. This eliminates data loss on primary failure but adds write latency equal to at least one network round trip.
- Semi-synchronous: A practical middle ground — one replica must confirm, the rest are async. Used heavily in high-availability MySQL setups.
Implementing Read/Write Split
Most modern stacks provide at least one of these mechanisms to separate read and write connections:
- Application-level routing: Your ORM or database library supports multiple connection strings. Writes use a
writeconnection; reads use areadconnection pool. Laravel's read/write connections config is a standard example — you list one or morereadhosts and awritehost; the framework routes automatically. - Proxy-level routing: A middleware proxy such as ProxySQL (MySQL) or PgBouncer + pgpool-II (PostgreSQL) sits between your app and the database. It inspects SQL statements and routes
SELECTstatements to replicas andINSERT/UPDATE/DELETE/BEGINto the primary. The application sees a single connection string. - Managed cloud replicas: Amazon RDS, Google Cloud SQL, and Azure Database all offer read replicas as a one-click feature. The cloud console gives you a separate read endpoint hostname. AWS Aurora goes further, providing a single reader endpoint that load-balances across all replicas automatically.
How Many Replicas Do You Need?
Start with numbers. Suppose your primary handles 8,000 read queries per second (QPS) and you want to stay under 60% CPU utilisation for headroom. At 60% your primary is comfortable at roughly 5,000 QPS. If each replica has the same hardware capacity:
- Total read load: 8,000 QPS
- Capacity per node at 60%: 5,000 QPS
- Replicas needed: ⌈8,000 / 5,000⌉ = 2 replicas, leaving the primary for writes only
In practice, add at least one extra replica as a hot standby and to absorb traffic spikes. Large platforms like GitHub and Shopify run dozens of replicas per shard — not just for read throughput, but also for regional proximity (putting a replica in the EU to serve European users with lower latency) and for analytical workloads (running heavy GROUP BY reports on a replica so they never touch the primary).
Replica Lag: Measuring and Managing It
Replication lag is the delay between a write being committed on the primary and being visible on a replica. You can observe it directly:
Strategies to keep lag low:
- Parallel replication: MySQL 8.0+ and PostgreSQL 10+ can apply the replication log using multiple threads in parallel, dramatically reducing lag under high write rates.
- Replica sizing: Give replicas the same (or more) RAM as the primary so the buffer pool is equally warm.
- Avoid heavy analytics queries on replicas under heavy lag — a long-running
SELECTcan hold locks that stall replication. - Monitor lag as a SLO metric — alert at > 5 seconds, page at > 30 seconds.
Failover: When the Primary Dies
A replica is not just a read-scaling tool — it is your recovery path when the primary fails. In a high-availability setup, one replica is designated the hot standby. When the primary goes down, an automatic failover process promotes the standby to primary within seconds, updates the DNS or proxy routing table, and the rest of the replicas re-connect to the new primary.
Tools that automate this: Orchestrator (MySQL), Patroni (PostgreSQL), Amazon RDS Multi-AZ (fully managed, ~60-second automatic failover). Cloud-managed replicas with automatic failover are now the standard for any production system that cannot tolerate multi-minute downtime.
DROP TABLE or a corrupted row written to the primary. Always maintain a separate periodic backup (e.g., mysqldump, pg_dump, or snapshot-based) in addition to your replicas.
Trade-offs at a Glance
- Benefit — read throughput: Linear horizontal scaling of reads. Add another replica, add another 5,000–15,000 QPS of read capacity.
- Benefit — geographic distribution: A replica in each region serves local users with sub-10 ms database latency instead of cross-ocean trips.
- Benefit — HA: Replicas double as standbys, turning hardware failures into seconds of downtime instead of hours.
- Cost — write scalability: Replication does not help writes. Every write still goes to one primary. For write-heavy workloads you need sharding (Lesson 8).
- Cost — consistency: Async replication means replicas may lag. Applications must be designed to tolerate eventual consistency on read paths, or route sensitive reads to the primary.
- Cost — operational complexity: More nodes mean more monitoring, more failover logic, more connection management. Use a proxy or managed service to absorb that complexity.
SHOW STATUS LIKE 'Threads_running' or equivalent.