Tuning & Troubleshooting
Tuning & Troubleshooting
A correctly installed Nginx serves requests. A correctly tuned Nginx serves ten times as many without breaking a sweat — and when something does go wrong, its logs tell you exactly why within seconds. This lesson covers three interlocking topics that every production engineer must own: worker process and connection limits, file descriptor headroom, and systematic log analysis.
Worker Processes and Worker Connections
Nginx uses a master + worker architecture. The master process owns the listening sockets and reloads config; worker processes do the actual I/O — reading requests, calling upstreams, sending responses. Workers are single-threaded and event-driven, so they can juggle thousands of simultaneous connections each, without threads or locks.
The two directives that define your server's top-end capacity live in nginx.conf at the top level and in the events block:
worker_connections be? Each open connection consumes roughly one file descriptor and a small amount of memory (8–16 KB depending on buffer sizes). A safe ceiling is: worker_connections = (available_file_descriptors / worker_processes) * 0.9. On a server with 65 535 FDs and 4 workers that gives you ~14 700 per worker — far above the 1024 default. Raise it to 4096 or 8192 for traffic-heavy front-ends.
File Descriptors: The Hidden Bottleneck
Every open socket, file, or pipe counts against the operating system's file descriptor (FD) limit. Under load, an under-configured server will log too many open files and start rejecting connections — a hard failure that looks like random timeouts to users.
There are two limits you must raise in concert:
- OS-level (system-wide):
/proc/sys/fs/file-max— the absolute ceiling for all processes on the machine. - Process-level (per-process): the
ulimit -nof the Nginx worker process, controlled innginx.confviaworker_rlimit_nofile.
LimitNOFILE. Check with systemctl show nginx | grep LimitNOFILE. To raise it, create a drop-in: systemctl edit nginx, add [Service] / LimitNOFILE=65535, then systemctl daemon-reload && systemctl reload nginx. Both the systemd limit and worker_rlimit_nofile must be set — they are independent knobs.
Reading Nginx Access Logs
The access log is your real-time window into production traffic. Every request Nginx handled appears here. The default combined log format captures the IP, timestamp, HTTP method, URI, status code, bytes sent, referer, and user-agent — enough to reconstruct exactly what happened.
$request_time and $upstream_response_time to your log format. The default format does not include these. Without them you cannot distinguish "Nginx was slow" (high $request_time, low $upstream_response_time) from "the backend was slow" (both high). Add a custom log format in nginx.conf and use it on all production server blocks:
log_format timed_combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time urt=$upstream_response_time';
Reading Nginx Error Logs
Error logs capture everything Nginx could not handle gracefully: upstream timeouts, worker crashes, permission errors, config mistakes picked up at runtime, and resource exhaustion. The default level is error; you can temporarily drop to warn or info for debugging, but never leave debug on in production — it writes a line per byte and saturates your disk.
Putting It All Together: A Tuning Checklist
When preparing a server for production or diagnosing performance regression, run through this sequence:
- Run
nprocand confirmworker_processes auto;is set — or pin it explicitly. - Check
worker_connections: for a front-end reverse proxy serving 10 k+ concurrent users, 4096–8192 is a good starting point. - Verify
worker_rlimit_nofilematches or exceedsworker_connections * 2(each connection to a client plus one to an upstream). - Confirm the systemd
LimitNOFILEis aligned. - Add
$request_timeand$upstream_response_timeto your log format before you go live — retrofitting logs after an incident is too late. - Set up log rotation with
logrotate(included in most distros) to prevent access logs from consuming all disk space over weeks of traffic.
nginx -T (capital T) to dump the full parsed config. This is the single most effective way to confirm that an include, a server block, or a directive is actually active. The output also double-checks your last nginx -t syntax validation by showing what Nginx actually loaded into memory.