Client-Side Load Balancing
Client-Side Load Balancing
In a traditional monolith you connect to a single database host or a single downstream service. In a microservices environment the same logical service runs as multiple identical instances — three pods of order-service, five pods of inventory-service, and so on. Something must decide which instance gets each request. That decision can happen in two places: at a dedicated load-balancer process sitting in between (server-side load balancing), or inside the calling service itself (client-side load balancing). This lesson is about the latter.
Server-Side vs Client-Side Load Balancing
Server-side load balancing means every outbound call goes to a fixed virtual IP or DNS name. An external device (an HAProxy, an AWS ALB, an Nginx upstream block) owns the instance list and the algorithm. The client knows only one address.
Client-side load balancing moves that responsibility into the caller. The caller:
- Fetches the live instance list from the service registry (Eureka, in our stack).
- Applies a selection algorithm locally to pick one instance.
- Sends the HTTP request directly to that instance's host and port.
The advantage is fewer network hops and full control over the routing algorithm. The trade-off is that every client must be smart enough to maintain a local cache of instances, handle stale entries, and retry on failure.
Adding the Dependency
Client-side load balancing is bundled in the Spring Cloud starter for the service discovery client. If you are already using the Eureka client starter, you have everything you need:
The spring-cloud-starter-loadbalancer artifact is a transitive dependency of the Eureka client starter and is pulled in automatically. You do not need to add it separately unless you are using a different discovery mechanism.
How Spring Cloud LoadBalancer Works
When you build an HTTP client using WebClient (reactive) or RestClient / RestTemplate (servlet), you annotate the @Bean that builds it with @LoadBalanced. Spring Cloud wraps the resulting client with an interceptor. At call time that interceptor:
- Detects a hostname that matches a registered service name (e.g.
http://order-service/...). - Queries the local ServiceInstanceListSupplier cache, which is periodically refreshed from Eureka.
- Runs the configured algorithm — round-robin by default — to select an instance.
- Rewrites the URL to the real
host:portof the chosen instance and proceeds.
Using a Load-Balanced WebClient
The reactive WebClient is the recommended HTTP client for Spring Boot 3 services. Configure a load-balanced builder bean once in a @Configuration class:
Inject the builder into any service bean that needs to call another service. Use the logical service name exactly as it is registered in Eureka — casing matters:
http://inventory-service:8082). LoadBalancer replaces the entire host portion; adding a port makes the substitution ambiguous and will break at runtime.
Using a Load-Balanced RestClient (Spring Boot 3.2+)
RestClient is the synchronous alternative introduced in Spring Framework 6.1. You get the same @LoadBalanced annotation pattern:
Changing the Load-Balancing Algorithm
The default algorithm is round-robin: requests are distributed evenly across all healthy instances in rotation. Spring Cloud LoadBalancer also ships a random algorithm. You switch per service by providing a custom configuration class and wiring it via @LoadBalancerClient:
Then reference that config class on the application entry point or any @Configuration class:
All calls to inventory-service now use random selection; calls to every other service still use round-robin.
Instance Cache and Health Filtering
Spring Cloud LoadBalancer caches the instance list in a ServiceInstanceListSupplier pipeline. By default it refreshes every 35 seconds. You can tune this in application.yml:
The supplier pipeline also supports a health-check filter. Enable it to automatically exclude instances that are reporting DOWN in Eureka:
Security Consideration: mTLS Across Instances
Client-side load balancing means your service opens a direct TCP connection to each instance. In a production cluster, those connections should be encrypted and mutually authenticated — otherwise a compromised internal service could impersonate any instance. The common patterns are:
- Service mesh (Istio / Linkerd): mTLS is handled transparently at the sidecar proxy layer; your application code sees plain HTTP.
- Spring Boot TLS client config: Configure a truststore and keystore on the
WebClient/RestClientfor direct mTLS without a mesh. - JWT propagation: Pass the caller's bearer token downstream on every service-to-service call so downstream services can enforce authorization independently.
Summary
Client-side load balancing with Spring Cloud LoadBalancer gives each service a local, registry-backed router. You annotate a WebClient.Builder or RestClient.Builder bean with @LoadBalanced, address downstream services by their Eureka service name, and the framework resolves and rotates across real instances at call time. Round-robin is the default; random selection and custom algorithms are wired per service through @LoadBalancerClient. Tune the instance cache TTL, enable health filtering, and pair with a retry mechanism to handle the inevitable stale-entry and crashed-instance scenarios that arise in production.