Scaling the Application Tier
Scaling the Application Tier
The application tier is where your business logic lives — the fleet of servers that receives HTTP requests, runs your code, queries databases, and returns responses. When a single server can no longer keep up, you add more. That sounds simple, but adding instances surfaces a problem that is invisible until you try it: sessions. This lesson covers how to run many application instances in parallel and how to handle session state correctly when you do.
Why the Application Tier Scales Horizontally So Easily
Compared to databases, application servers are the easiest component to scale. They typically do CPU and memory work — parsing requests, running logic, serialising responses — and they are generally stateless by design (more on that caveat in a moment). Stateless servers share nothing with each other, so the load balancer can send any request to any instance. Need twice the capacity? Add twice as many servers. The math is nearly linear, at least until you hit a shared downstream bottleneck such as the database.
Real-world numbers give intuition: a well-optimised API server handling 300 req/s per core, running on a 4-core machine, handles roughly 1,200 req/s before latency climbs. At 5,000 req/s you need about five such machines. At 50,000 req/s, fifty. The pattern is almost embarrassingly parallel — which is why companies like Netflix run tens of thousands of app instances across availability zones.
The Session Problem
Most web applications maintain sessions — small pieces of state that persist across multiple requests for the same user. Classic examples are a shopping cart, an authenticated user ID, a CSRF token, or a multi-step form's progress. Traditionally, sessions were stored in a file or in memory on the server that handled the first request. That works perfectly with one server. It breaks the moment you add a second.
Consider a user who logs in on App Server 1. Their session is created on that machine. On the next request, the load balancer routes them to App Server 2. Server 2 has no record of their session — the user appears logged out. This is the classic sticky session problem, and it has three main solutions.
Solution 1 — Sticky Sessions (Session Affinity)
The load balancer uses a cookie or IP hash to ensure every request from a given user always goes to the same server. The implementation is simple — most load balancers support it with a single flag — and it requires zero application changes.
Problems:
- If the pinned server restarts or is removed from rotation for a deploy, every session on it is lost and those users are effectively logged out.
- Load becomes uneven. A small set of "power users" can saturate one instance while others sit idle.
- Auto-scaling is complicated: you cannot easily drain a server without migrating sessions first.
- Debugging is harder because you cannot reproduce a user's experience from a different machine.
Solution 2 — Centralised Session Store
Sessions are stored in a shared external store — typically Redis or Memcached — that every application instance can reach. When a request arrives, the app server reads the session from Redis using the session ID from the cookie, handles the request, and writes any changes back. It does not matter which physical server handled the request.
Redis is the near-universal choice for session storage because:
- Sub-millisecond read/write latency (~0.1–0.5 ms on localhost, ~1 ms over a LAN).
- Built-in TTL: sessions expire automatically, keeping the store clean.
- Persistence options (RDB/AOF) protect against Redis restarts.
- Cluster mode scales to billions of keys if needed.
The cost is a network round-trip to Redis on every authenticated request. In practice this is 0.5–2 ms and is negligible compared to the database query that follows. The benefit — full horizontal freedom in the app tier — is almost always worth it.
SESSION_DRIVER=redis; in Express, connect-redis; in Spring Boot, Spring Session with Redis. You swap one line in your config and gain a horizontally scalable session layer instantly.
Solution 3 — Stateless Tokens (JWT / Signed Cookies)
The cleanest architectural solution is to push state out of the server entirely. Instead of a session ID that references server-side storage, the token is the state. A JWT (JSON Web Token) or an HMAC-signed cookie embeds the user ID, roles, and expiry inside the cookie itself, cryptographically signed so it cannot be tampered with. The server only needs the signing key — no storage required.
When a request arrives, the server verifies the signature and reads the claims directly from the token. Any instance can verify any token independently, with no inter-server communication and no round-trip to a store. This is the architecture used by Google, GitHub, and most modern microservice systems.
Trade-offs to understand:
- Token revocation is hard. A JWT is valid until it expires. If you need to invalidate a session immediately (e.g., account compromise), you must either maintain a small blocklist (defeating some of the statelessness) or use very short expiry times (~15 minutes) with refresh tokens.
- Token size. A JWT with standard claims is ~300–500 bytes, sent on every request. A session cookie is 32 bytes. At very high volume, the difference in header parsing is measurable (though rarely the bottleneck).
- Sensitive data in tokens. JWTs are base-64 encoded, not encrypted. Do not put secrets or PII in the payload unless you use JWE (encrypted JWTs).
Comparing the Three Approaches
Choosing between these strategies depends on your application's requirements:
- Sticky sessions — Appropriate only for legacy applications that cannot be modified and where occasional session loss on restart is acceptable. Avoid for new systems.
- Centralised session store (Redis) — The best default for most server-rendered web applications, or any app with server-side state that needs to be immediately revocable (e.g., banking, admin tools). Latency overhead is small; operational simplicity is high.
- Stateless JWT / signed cookies — Best for microservice architectures, mobile apps, and public APIs where many independent services must validate identity without a shared store. Requires careful token lifecycle management.
Beyond Sessions: Other Shared State Pitfalls
Sessions are the most visible shared-state problem, but not the only one. Watch for:
- In-process caches — A cache warm-up on Server 1 is cold on Server 2. Use a shared cache (Redis) for data that all instances need.
- Local file writes — Uploading a file to the local filesystem of App Server 1 means it is invisible to App Server 2. Use object storage (S3, GCS, Azure Blob) instead.
- Scheduled jobs — If each instance runs a cron job, the job runs N times. Use a distributed lock or a dedicated job scheduler to ensure exactly-once execution.
- WebSocket connections — A WebSocket connection is pinned to one server. Broadcasting a message to all clients of a user requires a pub/sub layer (Redis Pub/Sub, Ably, Pusher) so any server can publish to any connection regardless of which server holds it.
Deployment Mechanics: Rolling Deploys and Blue-Green
Once your application tier is stateless, deploying new versions without downtime becomes straightforward. Two common patterns:
- Rolling deploy: Replace instances one at a time (or a few at a time). The load balancer drains traffic from each instance before it is stopped (connection draining), waits for in-flight requests to complete, then brings up the new version. At no point are all instances down simultaneously. GitHub deploys this way hundreds of times per day.
- Blue-green deploy: Maintain two identical environments (blue and green). The live load balancer points at blue. You deploy the new version to green, run smoke tests, then flip the load balancer to green in seconds. If anything is wrong, flip back to blue. Zero downtime and instant rollback.
Neither pattern is possible with sticky sessions, because moving traffic away from an instance would log out all users pinned to it. Stateless sessions make both patterns trivial.