Scaling & Load Balancing

Load Balancing Algorithms

18 min Lesson 4 of 10

Load Balancing Algorithms

You have a pool of servers and a stream of incoming requests. Which server handles which request? The answer seems trivial until you realise that the wrong choice will send 80% of traffic to one machine while others idle, spike latency for session-heavy users when they land on a different node, or cause a cache full of warm data to go cold. Load balancing algorithms are the rules that govern this assignment — and choosing the right one for your workload is a meaningful architectural decision.

This lesson covers the three algorithms you will encounter on virtually every production system: Round Robin, Least Connections, and IP / Consistent Hashing. We examine how each works mechanically, what numbers and conditions it performs well under, where it breaks, and how to decide between them.

Round Robin

Round Robin is the simplest possible algorithm: serve request 1 to server A, request 2 to server B, request 3 to server C, then wrap around. Every server gets an equal share of requests by count.

A Weighted Round Robin variant assigns a numeric weight to each server. A machine with weight 3 receives three requests for every one sent to a machine with weight 1. This is the standard way to mix servers of different sizes in the same pool (for example, a 32-core node alongside two 8-core nodes).

When it shines: Round Robin performs excellently when requests are roughly uniform in processing cost and servers are homogeneous. A typical REST API where each endpoint does a lightweight DB read is a near-perfect fit. Netflix and Cloudflare use weighted round robin extensively at their ingress tiers where request cost variance is low.

The critical flaw is that Round Robin is blind to server load. Imagine server B is handling a 30-second batch export while server A and C are idle. Round Robin keeps sending requests to B at the same rate. The algorithm counts requests, not work.

Round Robin cycles through servers in order — equal request counts, but zero awareness of actual server load.

Least Connections

Least Connections (also called Least Outstanding Requests) routes each new request to the server with the fewest currently active connections. The load balancer tracks a live counter per server, increments it when a connection opens, and decrements it when the connection closes.

Consider four servers with active connection counts of 12, 45, 8, 31. The next request goes to server 3 (count: 8). After that server responds, its count drops back to 8 — or to 9 if another request arrives before the first finishes.

Best-fit workload: Any service where request duration varies significantly — file uploads/downloads, streaming, database-heavy endpoints, long-polling. A single 20-second video transcode should not block 20 other requests from landing on the same machine. Least Connections naturally compensates: that server's count spikes to 1 or 2 while others sit at 0, so all subsequent short requests route away from it.

Weighted Least Connections combines the two ideas: a server with weight 4 and 20 active connections is considered equivalent to a server with weight 1 and 5 connections (both have effective load 5 per unit of capacity). This is the standard configuration for mixed-capacity pools in production HAProxy and AWS ALB setups.

The trade-off: Least Connections requires the load balancer to maintain state (the connection counters). In a cluster of multiple load balancers, those counters must be synchronised or each balancer operates with a local approximation. At very high connection rates (>500k/sec), the counter update path can itself become a bottleneck. For the vast majority of systems this is not a real-world concern.

Pitfall — Slow Start: When you add a brand-new server to the pool, its connection count is 0, so Least Connections floods it with all incoming traffic before it is warm. HAProxy and Nginx Plus both have a "slow-start" parameter that ramps up new servers gradually — always enable it when deploying behind Least Connections.

IP Hashing & Consistent Hashing

Both Round Robin and Least Connections are stateless from the routing perspective — consecutive requests from the same client can land on different servers. For applications that store session state in memory (old-school PHP sessions, WebSocket connections, in-process caches), that is a problem: the user's cart, game state, or authentication token is not there.

IP Hashing solves this with a deterministic function: server = hash(client_ip) % N where N is the number of servers. The same IP always maps to the same server, giving you sticky sessions without any cookie or token overhead.

The fatal weakness appears when N changes. If you add or remove a server, every hash value modulo N changes, and virtually every existing client is remapped to a different server. All in-memory state is lost simultaneously — a "thundering herd" of cache misses.

Consistent Hashing solves this rebalancing problem elegantly. Imagine a hash space of 0 – 2³² − 1 arranged as a ring. Each server is placed at one or more points on the ring (its virtual nodes). An incoming request is hashed to a point on the ring, then assigned to the first server encountered clockwise. When a server is added or removed, only the keys that fall between it and its predecessor on the ring need to move — roughly 1/N of all keys rather than all of them.

Consistent hashing places servers on a ring; each request routes to the nearest server clockwise. Adding or removing a server reshuffles only a fraction of keys.

Virtual nodes (vnodes) are the key to making consistent hashing work evenly in practice. Instead of placing server A at one point on the ring, you place it at 100–150 points (each a different hash of "Server-A-1", "Server-A-2", …). This smooths out the key distribution so no single server ends up with a disproportionate arc of the ring. Cassandra and Amazon DynamoDB both use this vnode approach — DynamoDB typically assigns 64–128 vnodes per physical host.

Where consistent hashing shines: distributed caches (Memcached, Redis Cluster), database sharding routers, CDN edge node selection, and any stateful service where a client must be "sticky" to a specific server but pool membership changes frequently.

Side-by-Side Comparison

Algorithm selection matrix: choose based on workload uniformity, statefulness, and expected pool change frequency.

Choosing the Right Algorithm

In practice, most systems combine algorithms at different tiers rather than picking just one:

Stateless microservices (API, auth, search): Round Robin with health checks. Simple, low overhead, horizontally scalable. Use weighted variant when server sizes differ.
Mixed or unpredictable request cost (video, file, DB-heavy): Least Connections (or Least Outstanding Requests in HTTP/2-aware load balancers like AWS ALB). Automatically compensates for slow requests.
Distributed caches, sharded databases, session stores: Consistent hashing. The cost of remapping keys on pool change is far higher than the complexity of the algorithm.
WebSocket / long-lived connections: IP Hash or a cookie-based sticky session on top of Round Robin. The goal is connection continuity, not load evenness.

Real-world defaults: AWS ALB defaults to Least Outstanding Requests for HTTP. HAProxy defaults to Round Robin but ships Least Connections and Consistent Hash as first-class options. Nginx Open Source supports only Round Robin and IP Hash natively; Nginx Plus adds Least Connections and sticky sessions.

The algorithm is one variable in a larger set of load-balancing decisions: health check intervals, session persistence, connection draining during deploys, and circuit breakers all interact with the routing algorithm. A great algorithm paired with a missing health check will still send traffic to a dead server. Keep the full picture in mind as we move into Lesson 5: Health Checks and Failover.