Web Servers & Reverse Proxies

Web Server Architecture

18 min Lesson 1 of 28

Web Server Architecture

Every HTTP request your application serves passes through a web server. Understanding how that server handles connections — especially under concurrency — is the foundation of production performance tuning, capacity planning, and debugging traffic spikes. This lesson compares the two dominant architectural models used by Nginx and Apache, and explains why the choice matters at scale.

The Core Problem: Concurrency

A web server must handle thousands of simultaneous connections. The question is: how does the server manage all those connections without exhausting system resources? Two fundamentally different answers have emerged over the decades, each with its own trade-offs in throughput, memory usage, and failure modes.

The Process-Per-Connection Model (Apache prefork)

Apache's default prefork MPM (Multi-Processing Module) spawns a pool of worker processes at startup. When a request arrives, one idle process picks it up and owns it exclusively until the response is sent — including the full time waiting for the application backend to respond, database queries to complete, or slow clients to download the body.

Apache prefork process-per-connection model Client 1 Client 2 Client 3 Client N… Apache Master Process prefork MPM forks workers at startup MaxRequestWorkers Worker Process 1 busy (Client 1) Worker Process 2 busy (Client 2) Worker Process 3 busy (Client 3) No free worker → queue / refuse ~20-50 MB each process in RAM Each connection blocks one OS process for its entire duration
Apache prefork MPM: one OS process per connection, bounded by MaxRequestWorkers.

Each worker process carries the full Apache module stack in memory — typically 20–50 MB per process. With MaxRequestWorkers 256 (a common default), the server can consume 5–12 GB of RAM before a single slow client triggers the classic C10K problem: the server runs out of processes and new connections queue or are refused while existing processes sit idle waiting for I/O.

The "stuck worker" failure mode: A slow upstream (database, slow API) holds an Apache worker blocked in a system call. Meanwhile RAM fills with idle-but-blocked processes. Traffic spikes past MaxRequestWorkers; the OS kernel's listen backlog overflows; clients receive connection refused. This is not a code bug — it is an architectural limit of the process model under I/O-bound load.

The Event-Driven Model (Nginx)

Nginx was purpose-built in 2004 to solve the C10K problem. It uses a fundamentally different architecture: a small, fixed number of worker processes (typically one per CPU core), each running a non-blocking event loop. Instead of dedicating an OS process to each connection, a single worker multiplexes thousands of connections using the kernel's epoll (Linux) or kqueue (BSD/macOS) I/O notification APIs.

Nginx event-driven worker model Client 1 Client 2 Client 3 Client 4 Client 5 …10K+ Nginx Master Process reads config, signals Worker 1 CPU core 0 Event Loop (epoll) conn 1 → read req conn 2 → proxy write conn 5 → send body Worker 2 CPU core 1 Event Loop (epoll) conn 3 → TLS handshake conn 4 → idle keep-alive Upstream App / DB / API ~2-4 MB per worker fixed count = CPU cores Each worker handles thousands of connections without blocking
Nginx event-driven model: a fixed pool of workers (one per core) multiplex all connections via epoll.

When Nginx is waiting for a backend response, the worker does not sleep — it registers the socket with epoll and immediately handles another ready connection. The OS kernel notifies the worker the moment data arrives. This means 10,000 keep-alive connections cost roughly the same CPU as 10 active ones, because idle connections consume almost no cycles.

Configuring the Nginx Worker Model

The main knobs in /etc/nginx/nginx.conf:

# /etc/nginx/nginx.conf — top-level (main context) # Match the number of CPU cores available worker_processes auto; # auto = detected at runtime # Max simultaneous connections PER worker # Total capacity = worker_processes * worker_connections events { worker_connections 4096; # default 512 — raise on high-traffic servers use epoll; # explicit on Linux; auto-selected anyway multi_accept on; # accept all pending connections at once }

On a 4-core machine with worker_connections 4096, Nginx can theoretically handle 16,384 simultaneous connections with four OS processes consuming perhaps 16–20 MB of RAM total. An Apache prefork setup serving the same load would need thousands of processes and gigabytes of RAM.

Production tuning baseline: Set worker_processes auto and raise worker_connections to 4096 or 8192 on servers with more than 1 GB RAM. Also raise the OS file descriptor limit — each connection needs a file descriptor. Add worker_rlimit_nofile 65535; to nginx.conf and set LimitNOFILE=65535 in the systemd unit or /etc/security/limits.conf.

Where Apache Still Wins: Worker MPM and mod_php

Apache is not obsolete. Its worker MPM and event MPM are thread-based hybrids that significantly reduce memory vs. prefork. More importantly, mod_php embeds the PHP interpreter directly in the Apache process — a tight integration that avoids the overhead of a separate FastCGI process manager. Many shared hosting environments and legacy PHP stacks still rely on this model. For new workloads, however, Nginx + PHP-FPM has become the standard pattern (covered in Lesson 3).

Key Takeaways

  • Blocking I/O (Apache prefork): simple, isolated, but memory-hungry and bounded by process count under I/O-wait.
  • Non-blocking event loop (Nginx): scales to tens of thousands of connections with minimal RAM; CPU-bound work still blocks the worker.
  • Nginx's architecture makes it the dominant choice for reverse proxies, TLS termination, and static file serving — all workloads dominated by I/O wait, not CPU.
  • Both servers are correct tools: choose based on workload characteristics, not tribal preference.
The C10K paper (Dan Kegel, 1999) is the original analysis that motivated Nginx and Node.js. If you want to understand why non-blocking I/O matters at scale, it is still worth reading. The architectural patterns it describes remain the foundation of every modern high-concurrency server.