Load Balancing Algorithms
Load Balancing Algorithms
You have a pool of servers and a stream of incoming requests. Which server handles which request? The answer seems trivial until you realise that the wrong choice will send 80% of traffic to one machine while others idle, spike latency for session-heavy users when they land on a different node, or cause a cache full of warm data to go cold. Load balancing algorithms are the rules that govern this assignment — and choosing the right one for your workload is a meaningful architectural decision.
This lesson covers the three algorithms you will encounter on virtually every production system: Round Robin, Least Connections, and IP / Consistent Hashing. We examine how each works mechanically, what numbers and conditions it performs well under, where it breaks, and how to decide between them.
Round Robin
Round Robin is the simplest possible algorithm: serve request 1 to server A, request 2 to server B, request 3 to server C, then wrap around. Every server gets an equal share of requests by count.
A Weighted Round Robin variant assigns a numeric weight to each server. A machine with weight 3 receives three requests for every one sent to a machine with weight 1. This is the standard way to mix servers of different sizes in the same pool (for example, a 32-core node alongside two 8-core nodes).
The critical flaw is that Round Robin is blind to server load. Imagine server B is handling a 30-second batch export while server A and C are idle. Round Robin keeps sending requests to B at the same rate. The algorithm counts requests, not work.
Least Connections
Least Connections (also called Least Outstanding Requests) routes each new request to the server with the fewest currently active connections. The load balancer tracks a live counter per server, increments it when a connection opens, and decrements it when the connection closes.
Consider four servers with active connection counts of 12, 45, 8, 31. The next request goes to server 3 (count: 8). After that server responds, its count drops back to 8 — or to 9 if another request arrives before the first finishes.
Weighted Least Connections combines the two ideas: a server with weight 4 and 20 active connections is considered equivalent to a server with weight 1 and 5 connections (both have effective load 5 per unit of capacity). This is the standard configuration for mixed-capacity pools in production HAProxy and AWS ALB setups.
The trade-off: Least Connections requires the load balancer to maintain state (the connection counters). In a cluster of multiple load balancers, those counters must be synchronised or each balancer operates with a local approximation. At very high connection rates (>500k/sec), the counter update path can itself become a bottleneck. For the vast majority of systems this is not a real-world concern.
IP Hashing & Consistent Hashing
Both Round Robin and Least Connections are stateless from the routing perspective — consecutive requests from the same client can land on different servers. For applications that store session state in memory (old-school PHP sessions, WebSocket connections, in-process caches), that is a problem: the user's cart, game state, or authentication token is not there.
IP Hashing solves this with a deterministic function: server = hash(client_ip) % N where N is the number of servers. The same IP always maps to the same server, giving you sticky sessions without any cookie or token overhead.
The fatal weakness appears when N changes. If you add or remove a server, every hash value modulo N changes, and virtually every existing client is remapped to a different server. All in-memory state is lost simultaneously — a "thundering herd" of cache misses.
Consistent Hashing solves this rebalancing problem elegantly. Imagine a hash space of 0 – 2³² − 1 arranged as a ring. Each server is placed at one or more points on the ring (its virtual nodes). An incoming request is hashed to a point on the ring, then assigned to the first server encountered clockwise. When a server is added or removed, only the keys that fall between it and its predecessor on the ring need to move — roughly 1/N of all keys rather than all of them.
Virtual nodes (vnodes) are the key to making consistent hashing work evenly in practice. Instead of placing server A at one point on the ring, you place it at 100–150 points (each a different hash of "Server-A-1", "Server-A-2", …). This smooths out the key distribution so no single server ends up with a disproportionate arc of the ring. Cassandra and Amazon DynamoDB both use this vnode approach — DynamoDB typically assigns 64–128 vnodes per physical host.
Where consistent hashing shines: distributed caches (Memcached, Redis Cluster), database sharding routers, CDN edge node selection, and any stateful service where a client must be "sticky" to a specific server but pool membership changes frequently.
Side-by-Side Comparison
Choosing the Right Algorithm
In practice, most systems combine algorithms at different tiers rather than picking just one:
- Stateless microservices (API, auth, search): Round Robin with health checks. Simple, low overhead, horizontally scalable. Use weighted variant when server sizes differ.
- Mixed or unpredictable request cost (video, file, DB-heavy): Least Connections (or Least Outstanding Requests in HTTP/2-aware load balancers like AWS ALB). Automatically compensates for slow requests.
- Distributed caches, sharded databases, session stores: Consistent hashing. The cost of remapping keys on pool change is far higher than the complexity of the algorithm.
- WebSocket / long-lived connections: IP Hash or a cookie-based sticky session on top of Round Robin. The goal is connection continuity, not load evenness.
The algorithm is one variable in a larger set of load-balancing decisions: health check intervals, session persistence, connection draining during deploys, and circuit breakers all interact with the routing algorithm. A great algorithm paired with a missing health check will still send traffic to a dead server. Keep the full picture in mind as we move into Lesson 5: Health Checks and Failover.