Scaling & Load Balancing

Scaling the Application Tier

18 min Lesson 6 of 10

Scaling the Application Tier

The application tier is where your business logic lives — the fleet of servers that receives HTTP requests, runs your code, queries databases, and returns responses. When a single server can no longer keep up, you add more. That sounds simple, but adding instances surfaces a problem that is invisible until you try it: sessions. This lesson covers how to run many application instances in parallel and how to handle session state correctly when you do.

Why the Application Tier Scales Horizontally So Easily

Compared to databases, application servers are the easiest component to scale. They typically do CPU and memory work — parsing requests, running logic, serialising responses — and they are generally stateless by design (more on that caveat in a moment). Stateless servers share nothing with each other, so the load balancer can send any request to any instance. Need twice the capacity? Add twice as many servers. The math is nearly linear, at least until you hit a shared downstream bottleneck such as the database.

Real-world numbers give intuition: a well-optimised API server handling 300 req/s per core, running on a 4-core machine, handles roughly 1,200 req/s before latency climbs. At 5,000 req/s you need about five such machines. At 50,000 req/s, fifty. The pattern is almost embarrassingly parallel — which is why companies like Netflix run tens of thousands of app instances across availability zones.

The Session Problem

Most web applications maintain sessions — small pieces of state that persist across multiple requests for the same user. Classic examples are a shopping cart, an authenticated user ID, a CSRF token, or a multi-step form's progress. Traditionally, sessions were stored in a file or in memory on the server that handled the first request. That works perfectly with one server. It breaks the moment you add a second.

Consider a user who logs in on App Server 1. Their session is created on that machine. On the next request, the load balancer routes them to App Server 2. Server 2 has no record of their session — the user appears logged out. This is the classic sticky session problem, and it has three main solutions.

Left: sticky sessions route a user to the same server, but fail on restarts. Right: a centralised session store decouples sessions from instances so any server can serve any request.

Solution 1 — Sticky Sessions (Session Affinity)

The load balancer uses a cookie or IP hash to ensure every request from a given user always goes to the same server. The implementation is simple — most load balancers support it with a single flag — and it requires zero application changes.

Problems:

If the pinned server restarts or is removed from rotation for a deploy, every session on it is lost and those users are effectively logged out.
Load becomes uneven. A small set of "power users" can saturate one instance while others sit idle.
Auto-scaling is complicated: you cannot easily drain a server without migrating sessions first.
Debugging is harder because you cannot reproduce a user's experience from a different machine.

When to avoid sticky sessions: Any system that needs rolling deploys, auto-scaling, or zero-downtime restarts should not rely on sticky sessions as the primary session strategy. They are a crutch that defers the problem rather than solving it.

Solution 2 — Centralised Session Store

Sessions are stored in a shared external store — typically Redis or Memcached — that every application instance can reach. When a request arrives, the app server reads the session from Redis using the session ID from the cookie, handles the request, and writes any changes back. It does not matter which physical server handled the request.

Redis is the near-universal choice for session storage because:

Sub-millisecond read/write latency (~0.1–0.5 ms on localhost, ~1 ms over a LAN).
Built-in TTL: sessions expire automatically, keeping the store clean.
Persistence options (RDB/AOF) protect against Redis restarts.
Cluster mode scales to billions of keys if needed.

The cost is a network round-trip to Redis on every authenticated request. In practice this is 0.5–2 ms and is negligible compared to the database query that follows. The benefit — full horizontal freedom in the app tier — is almost always worth it.

Framework support: Every major web framework has a Redis session driver. In Laravel it is SESSION_DRIVER=redis; in Express, connect-redis; in Spring Boot, Spring Session with Redis. You swap one line in your config and gain a horizontally scalable session layer instantly.

Solution 3 — Stateless Tokens (JWT / Signed Cookies)

The cleanest architectural solution is to push state out of the server entirely. Instead of a session ID that references server-side storage, the token is the state. A JWT (JSON Web Token) or an HMAC-signed cookie embeds the user ID, roles, and expiry inside the cookie itself, cryptographically signed so it cannot be tampered with. The server only needs the signing key — no storage required.

When a request arrives, the server verifies the signature and reads the claims directly from the token. Any instance can verify any token independently, with no inter-server communication and no round-trip to a store. This is the architecture used by Google, GitHub, and most modern microservice systems.

Trade-offs to understand:

Token revocation is hard. A JWT is valid until it expires. If you need to invalidate a session immediately (e.g., account compromise), you must either maintain a small blocklist (defeating some of the statelessness) or use very short expiry times (~15 minutes) with refresh tokens.
Token size. A JWT with standard claims is ~300–500 bytes, sent on every request. A session cookie is 32 bytes. At very high volume, the difference in header parsing is measurable (though rarely the bottleneck).
Sensitive data in tokens. JWTs are base-64 encoded, not encrypted. Do not put secrets or PII in the payload unless you use JWE (encrypted JWTs).

Session-store approach requires a Redis round-trip on every request; JWT approach verifies the token in-process (CPU only) and goes straight to the database.

Comparing the Three Approaches

Choosing between these strategies depends on your application's requirements:

Sticky sessions — Appropriate only for legacy applications that cannot be modified and where occasional session loss on restart is acceptable. Avoid for new systems.
Centralised session store (Redis) — The best default for most server-rendered web applications, or any app with server-side state that needs to be immediately revocable (e.g., banking, admin tools). Latency overhead is small; operational simplicity is high.
Stateless JWT / signed cookies — Best for microservice architectures, mobile apps, and public APIs where many independent services must validate identity without a shared store. Requires careful token lifecycle management.

Beyond Sessions: Other Shared State Pitfalls

Sessions are the most visible shared-state problem, but not the only one. Watch for:

In-process caches — A cache warm-up on Server 1 is cold on Server 2. Use a shared cache (Redis) for data that all instances need.
Local file writes — Uploading a file to the local filesystem of App Server 1 means it is invisible to App Server 2. Use object storage (S3, GCS, Azure Blob) instead.
Scheduled jobs — If each instance runs a cron job, the job runs N times. Use a distributed lock or a dedicated job scheduler to ensure exactly-once execution.
WebSocket connections — A WebSocket connection is pinned to one server. Broadcasting a message to all clients of a user requires a pub/sub layer (Redis Pub/Sub, Ably, Pusher) so any server can publish to any connection regardless of which server holds it.

The golden rule of horizontal scaling: Any state that must survive a request must live outside the application process. Application instances must be disposable — spin up in seconds, shut down without ceremony, and be fully replaceable by any identical instance.

Deployment Mechanics: Rolling Deploys and Blue-Green

Once your application tier is stateless, deploying new versions without downtime becomes straightforward. Two common patterns:

Rolling deploy: Replace instances one at a time (or a few at a time). The load balancer drains traffic from each instance before it is stopped (connection draining), waits for in-flight requests to complete, then brings up the new version. At no point are all instances down simultaneously. GitHub deploys this way hundreds of times per day.
Blue-green deploy: Maintain two identical environments (blue and green). The live load balancer points at blue. You deploy the new version to green, run smoke tests, then flip the load balancer to green in seconds. If anything is wrong, flip back to blue. Zero downtime and instant rollback.

Neither pattern is possible with sticky sessions, because moving traffic away from an instance would log out all users pinned to it. Stateless sessions make both patterns trivial.

Connection draining: Always configure your load balancer with a draining timeout (typically 30–120 seconds). This allows the deregistered instance to finish all in-flight requests before it shuts down, preventing half-processed transactions from being aborted.