Linux Fundamentals

Processes & Signals

18 min Lesson 7 of 26

Processes & Signals

Every program running on a Linux system is a process — a running instance of a program with its own PID (Process ID), memory space, file descriptors, and scheduling slot. In production, you constantly interact with processes: you inspect them to diagnose performance issues, send signals to reload config without downtime, and manage priority to protect critical workloads from noisy neighbours. This lesson gives you the full toolkit.

Inspecting Processes with ps

ps is a snapshot tool — it shows the state of processes at the moment you run it. The most useful invocation is ps aux, which lists all processes from all users in a human-readable format.

# Full process list — the standard production incantation ps aux # Columns: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND # VSZ = virtual memory (KB), RSS = resident (physical) memory (KB) # Show a process tree — great for spotting zombie parents ps auxf # Find a specific service without grep showing itself ps aux | grep '[n]ginx' # Show only PID, name, CPU, mem for a specific user ps -u deploy -o pid,comm,%cpu,%mem --sort=-%cpu

The STAT column is important in production diagnostics. R = running, S = sleeping (waiting on I/O), D = uninterruptible sleep (usually disk I/O — a high D count signals storage bottleneck), Z = zombie (process has exited but its parent has not yet called wait()), T = stopped.

Zombie processes do not consume CPU or memory but they consume a PID slot. In a containerised microservice that forks heavily (Node.js cluster, PHP-FPM) without a proper init process, zombie accumulation can exhaust the PID namespace — a real production outage vector. Use tini or dumb-init as PID 1 inside containers to reap orphans correctly.

Real-Time Monitoring with top and htop

top is the classic interactive process monitor. The header gives a system-level summary (load average, CPU steal, memory), and the body shows the top consumers sorted by CPU. Essential key bindings inside top: press M to sort by memory, P to sort by CPU, k to kill a process by PID, 1 to expand per-CPU view (critical on multi-core servers), and q to quit.

htop is the modern replacement — it adds colour, mouse support, per-core meters, and a tree view. Install it with your package manager and use htop -u nginx to filter by user immediately.

Load average context: the three numbers in top's header (e.g. 2.40 1.87 1.52) represent 1-minute, 5-minute, and 15-minute load averages. A load average equal to the number of CPU cores means the system is fully saturated. On a 4-core machine, a sustained load of 4.0 is 100% utilisation; 8.0 means work is queuing. Watch the trend — a rising 15-min average is an early warning sign.

Signals: Communicating with Processes

Signals are asynchronous notifications sent to a process. The kernel or any authorised user can send them. Run kill -l to see the full list. The ones you use in production every day:

Common Linux signals and their effects Signal Number Catchable What Happens SIGTERM (15) 15 Yes Graceful shutdown — flush buffers, close connections SIGKILL (9) 9 No Kernel immediately destroys process — no cleanup SIGHUP (1) 1 Yes Reload config (nginx, sshd) — zero downtime SIGINT (2) 2 Yes Ctrl+C — interrupt, same intent as SIGTERM SIGUSR1 (10) 10 Yes App-defined: nginx log rotation, unicorn hot reload SIGSTOP/CONT 19/18 No/Yes Pause / resume a process (Ctrl+Z sends SIGTSTP)
Common Linux signals: number, catchability, and what they do to the target process.
# Send SIGTERM (graceful shutdown) — always try this first kill 1234 kill -15 1234 kill -SIGTERM 1234 # all three are equivalent # Force-kill when the process is stuck and ignoring SIGTERM kill -9 1234 # Reload nginx config without dropping connections kill -HUP $(cat /var/run/nginx.pid) # Or using systemd (preferred in modern systems) systemctl reload nginx # Kill by name — sends SIGTERM to ALL matching processes pkill nginx pkill -HUP nginx # reload all nginx workers # Kill all processes owned by a user (use with care!) pkill -u olduser
Never reach for SIGKILL first. A process killed with kill -9 cannot close database connections, flush write buffers, or release advisory locks. This causes data corruption in databases, half-written files, and orphaned lock files that block the next startup. Always send SIGTERM first, wait 5-30 seconds, and only escalate to SIGKILL if the process refuses to exit.

Foreground, Background, and Job Control

The shell tracks jobs — groups of processes tied to the current terminal session. You can run commands in the background to keep the shell free, and move jobs between foreground and background on the fly.

# Run a command in the background — appending & to the command ./long-migration.sh & # Output: [1] 4821 ← [job number] PID # See all background jobs in the current shell jobs -l # Bring job 1 back to the foreground fg %1 # Suspend the foreground process (sends SIGTSTP) # Press: Ctrl+Z # Output: [1]+ Stopped ./long-migration.sh # Resume job 1 in the background bg %1 # Detach a process from the terminal entirely (survives logout) nohup ./long-migration.sh > migration.log 2>&1 & # The modern, preferred alternative — tmux or screen tmux new-session -d -s migration './long-migration.sh' tmux attach -t migration
Production habit: never run long operations (database migrations, large data exports, log processing) in a bare SSH terminal without tmux or screen. If your SSH connection drops mid-run, the process receives SIGHUP and dies. Use tmux — it keeps sessions alive on the server independent of your connection.

Process Priority with nice and renice

Linux schedules processes using a niceness value from -20 (highest priority, rudest to other processes) to +19 (lowest priority, most polite). The default is 0. Only root can set negative (higher-priority) values.

# Start a CPU-intensive backup at low priority so it doesn't starve the web server nice -n 15 tar -czf /backup/app.tar.gz /var/www/app # Lower the priority of an already-running process renice +10 -p 4821 # Find current niceness values (NI column) ps -eo pid,comm,ni --sort=-ni | head -20 # Combine with ionice to limit both CPU and disk I/O priority ionice -c 3 nice -n 19 ./heavy-analytics-job.sh
Big-tech context: On shared infrastructure, setting nice +15 on batch jobs (log shipping, analytics, scheduled reports) is standard practice. It guarantees that if the host is under load, the critical request-serving processes always get CPU first, and the batch job gracefully yields. Kubernetes and cgroups enforce this at a higher level in container environments, but the underlying scheduler concept is identical.

Finding a Process by Port or File

Two more utilities complete your daily toolkit: lsof (list open files) and fuser. Both let you find which process owns a resource — essential for diagnosing "address already in use" errors on deployment.

# Who is listening on port 8080? lsof -i :8080 # Or with ss (more modern than netstat) ss -tlnp | grep :8080 # Which process has a specific file open? lsof /var/log/app.log # Force-release a port that a crashed process left behind fuser -k 8080/tcp

Combining these tools — ps, top, kill, jobs, nice, and lsof — gives you full visibility and control over every running process on any Linux server. The patterns here are identical whether you're debugging a bare-metal host or exec-ing into a Kubernetes pod.