Networking & Communication

WebSockets & Real-Time Communication

18 min Lesson 7 of 10

WebSockets & Real-Time Communication

Every HTTP request follows the same choreography: client asks, server answers, connection closes (or idles). This request-response cycle is efficient for fetching a web page or submitting a form, but it becomes a bottleneck the moment your product needs to push data to the client the instant something changes — a chat message arriving, a stock price moving, a multiplayer game state updating. WebSockets solve exactly this problem by turning a single HTTP connection into a persistent, full-duplex channel where either side can send a message at any time.

The WebSocket Handshake

A WebSocket connection begins as a perfectly ordinary HTTP/1.1 request. The client sends an Upgrade header signalling its intent, the server agrees with a 101 Switching Protocols response, and from that point the underlying TCP connection is repurposed — the HTTP framing is discarded and replaced by the lightweight WebSocket framing protocol (RFC 6455). No new TCP handshake, no TLS renegotiation; the upgrade is in-place.

WebSocket upgrade handshake followed by bidirectional frame exchange. Either side can send at any time; either side can initiate the close.

The key numbers to remember: the WebSocket frame header overhead is just 2 to 14 bytes per message, compared to the 400–900 bytes of HTTP headers that would accompany every polling request. At high message frequencies — a trading terminal receiving 1,000 price ticks per second — this difference is not academic.

When Real-Time Actually Matters

Not every "live" feature needs WebSockets. Before choosing a technology, define your latency and frequency requirements:

Chat & messaging — Slack, WhatsApp Web. Users expect messages to appear in under a second. High connection counts (millions of persistent connections), moderate per-user message rates.
Collaborative editing — Google Docs, Figma. Cursor positions and delta operations must sync across all editors with tens-of-milliseconds latency. Missed operations break document consistency.
Live trading / market data — Bloomberg Terminal, Robinhood. Price quotes can arrive thousands of times per second per instrument. Any per-message HTTP overhead is prohibitive.
Multiplayer gaming — Player positions, health, physics state. Latency above ~100 ms is perceptible and ruins gameplay.
Live dashboards & monitoring — Grafana live panels, deployment logs streaming to a browser. Server pushes data; the client rarely sends back. SSE can work here too.
Notifications — "Your order shipped." Infrequent, one-way. A simple SSE stream or even long polling is sufficient; full WebSockets may be over-engineered.

Key question: Does the client need to send data to the server frequently, or only receive it? Full-duplex WebSockets shine when traffic is heavy in both directions. If traffic is mostly server-to-client, SSE is simpler and HTTP/2 compatible.

Architecture: Scaling WebSocket Servers

A single WebSocket server process can hold tens of thousands of simultaneous connections, but real systems need multiple servers. This introduces a fundamental problem: a message sent by Client A (connected to Server 1) must reach Client B (connected to Server 2).

The standard solution is a Pub/Sub backplane — typically Redis Pub/Sub or a message broker. Each server subscribes to the channels relevant to its connected clients. When a message arrives on any server, it publishes to the backplane; the backplane fans it out to all other servers, which forward it to their matching connections.

Three WebSocket servers behind a load balancer, all connected to a Redis Pub/Sub backplane. A message published on Server 1 reaches clients on Server 2 and 3 via the backplane.

Sticky sessions matter: Standard round-robin load balancing breaks WebSocket reconnects — the client may land on a different server and lose subscription state. Use consistent hashing on user ID or sticky sessions (IP hash, cookie-based) so a reconnecting client reliably returns to the same server node (or you maintain session state in a shared store).

Connection Management & Heartbeats

A persistent TCP connection is invisible to network equipment — firewalls and NAT devices have idle-connection timers, typically 30–300 seconds, after which they silently drop the connection. The client and server are unaware until they try to send a message and receive no acknowledgement.

The solution is a ping/pong heartbeat: the server sends a WebSocket Ping control frame every 20–30 seconds; the client is required by the protocol to reply with a Pong. If the server receives no Pong within a timeout window, it closes the connection. The client, noticing the close, triggers a reconnect with exponential backoff — typically starting at 1 second, doubling up to a cap of 30–60 seconds, with a small random jitter to prevent thousands of clients reconnecting simultaneously (the thundering herd problem).

Security Considerations

WebSocket connections should always use wss:// (TLS-encrypted, the equivalent of HTTPS). The initial HTTP handshake is subject to the same Same-Origin Policy as regular requests, but the ongoing connection is not — once established, the browser does not enforce origin checks on frames. This means:

Authenticate on connect — verify a token or session cookie during the HTTP upgrade request, before the WebSocket is opened. Do not rely on later in-band authentication.
Validate every message — treat WebSocket messages from clients as untrusted input, exactly like HTTP request bodies. Sanitise and validate.
Rate-limit at the connection and message level — a single client opening thousands of connections or sending millions of small frames per second is a DoS vector. Enforce limits in your WS server or gateway layer.

Do not store authentication in the URL. It is common to see wss://api.example.com/ws?token=SECRET — this leaks the token into server access logs, proxy logs, and the browser history. Pass the token in a cookie set during login (which is transmitted automatically on the upgrade request) or send it as the first message frame after the connection opens.

WebSockets vs. the Alternatives

In practice, WebSockets sit alongside two other techniques for server-pushed data. The right choice depends on your traffic pattern:

Long Polling — The client sends an HTTP request; the server holds it open until data is available, then responds. The client immediately sends a new request. Works everywhere, high latency overhead, not truly bidirectional.
Server-Sent Events (SSE) — A single long-lived HTTP response stream; the server pushes newline-delimited text events. Natively browser-supported, HTTP/2 multiplexed, perfect for one-way feeds. No client-to-server path.
WebSockets — Full-duplex, binary or text frames, low overhead. Required when the client sends frequently (gaming input, collaborative edits, chat typing indicators).

Lesson 9 of this tutorial covers the full comparison in detail. For now: if you need bidirectional, low-latency, high-frequency communication — WebSockets are the right tool.

Real-world scale: Slack maintains roughly 10 million simultaneous WebSocket connections to serve its global user base. Discord has reported handling over 100 million WebSocket messages per second across its infrastructure. Both use Redis Pub/Sub (or equivalent) as the backplane between their WebSocket gateway nodes.