Stateless vs Stateful Services
Stateless vs Stateful Services
One of the most consequential decisions in system design is whether a service should store state internally or delegate that state elsewhere. This single choice determines whether you can spin up ten more servers in sixty seconds — or whether adding a second server breaks everything.
What Is "State"?
State is any information that must persist between requests to serve future requests correctly. Common examples include:
- A logged-in user's session data (who they are, what they are allowed to do)
- Items in a shopping cart mid-checkout
- A partially uploaded file buffer
- A WebSocket connection's message history
A stateless service keeps none of this data in its own memory. Every incoming request carries all the information the server needs to handle it — or the server fetches that information from a shared external store. Once the response is sent, the server forgets everything about that request.
A stateful service remembers things between requests in its own process memory or local disk. Subsequent requests from the same client must reach the same server instance, or the service breaks.
Why Stateless Services Scale Horizontally
Imagine you have one application server handling 1,000 requests per minute. Traffic doubles. With a stateless service, the fix is mechanical: launch a second identical server, put a load balancer in front of both, and route requests to either one. Neither server knows or cares which requests the other is handling — they are perfectly interchangeable replicas.
With a stateful service, that move is blocked. If Server A holds Alice's session in RAM and the load balancer sends Alice's next request to Server B, Server B has no idea who Alice is. She gets logged out, or worse, sees corrupted data.
The "Sticky Session" Workaround — and Why It Fails
Teams running stateful services often reach for sticky sessions (also called session affinity): the load balancer tags each client with a cookie and always routes that client to the same server. This makes the service appear to scale, but introduces serious problems:
- Uneven load. High-traffic users are pinned to one server. That server heats up while others idle. The load balancer is no longer balancing.
- No graceful failover. If Server A dies, all of its pinned users lose their sessions instantly. There is nothing to fall back to.
- Deployment friction. Rolling deploys become dangerous: draining a server's sticky users before taking it offline requires careful orchestration.
- Hard cap on scale. At extreme scale (millions of sessions), RAM fills up. You cannot add RAM faster than sessions grow.
How to Externalise State
The correct fix is to move state out of the server process entirely, into a shared store that every server instance can reach. The three main patterns:
1. Shared Session Store (e.g., Redis)
Instead of writing session data to local RAM, write it to a Redis cluster. Every server reads and writes to the same Redis. The session ID travels in a cookie; the actual session data lives in Redis. Adding a new server costs zero migration — it just starts reading Redis too.
2. Token-Based Authentication (e.g., JWT)
Store no session on the server at all. Issue the client a cryptographically signed token (JWT) that contains the identity and permissions inline. The server validates the signature on every request — no database lookup, no shared store needed for auth. Stateless by construction.
3. Delegating to the Client
Some state — UI preferences, shopping cart items — can live in the browser (localStorage, cookies, client-side state). The server becomes a pure function: input in, output out, no memory between calls. This works well for non-sensitive, user-specific presentation state.
Diagram: Externalising Session State with Redis
When Stateful Is Intentional
Not all stateful services are design mistakes. Some components are inherently stateful and are designed with that in mind:
- Databases and caches — the entire job is storing state. They scale through replication, sharding, and leader-follower patterns (covered in later lessons).
- Message queues — Kafka, RabbitMQ. They persist messages across brokers with replication and partitioning.
- WebSocket / streaming servers — a live connection is inherently stateful. Handled by routing connection IDs through a pub/sub bus (Redis Pub/Sub, Kafka) so multiple servers can broadcast to the same client.
The distinction is that these services are purpose-built for state management and expose explicit mechanisms for replication and failover. Your application tier should not be doing that job implicitly in RAM.
Practical Checklist: Making a Service Stateless
- Audit every variable your process holds between requests. Sessions, in-memory caches, file locks, connection counters — all of these are state.
- Move session data to Redis (or a database-backed session store).
- Replace user identity from session lookups with JWT validation, or keep thin session tokens pointing to Redis entries.
- Move any per-server in-memory cache to a shared cache (Redis, Memcached).
- Ensure uploaded files go to object storage (S3, GCS) — not to the server's local filesystem.
- Remove any background jobs or timers that run inside the web process — move them to a dedicated queue worker.
Key Takeaways
- A stateless service can be horizontally scaled by adding identical replicas — no coordination required.
- A stateful service is pinned: requests must reach the server holding their state, creating bottlenecks and fragility.
- The solution is to externalise state: Redis for sessions, JWTs for identity, object storage for files, queues for async work.
- Stateful infrastructure (databases, queues) is fine — they have built-in scale mechanisms. Stateful application logic is the problem to eliminate.