Connection Management & Pooling
Connection Management & Pooling
Every PostgreSQL or MySQL connection is a heavyweight OS process (or thread). At peak traffic, naive application designs open a new connection per request — and at scale, that turns into a production outage faster than almost any other database anti-pattern. Understanding how connections are actually managed, and why a pooler belongs between your application and your database, is a non-negotiable skill for any engineer running databases at production scale.
What max_connections Actually Means
In PostgreSQL, max_connections is a hard cap on the total number of backend processes the server will accept — including your application, replication standbys, background workers, and your own admin sessions. The default is 100. Each connection consumes roughly 5–10 MB of RAM for its backend process plus shared memory overhead. On a 16 GB instance, allowing 500 connections consumes 2.5–5 GB before a single query executes.
The critical insight: most connections are idle most of the time. A typical web application with 200 concurrent users does not need 200 active database connections — it needs at most a fraction of that, because database operations last milliseconds. But without a pooler, most frameworks open and hold one connection per thread or goroutine, which means 200 threads means 200 connections.
MySQL uses max_connections the same way, but its per-connection cost differs because MySQL uses a thread model rather than a forked-process model. The memory math is still critical — MySQL allocates per-thread buffers (sort_buffer_size, join_buffer_size, etc.) on demand, so 1,000 connections executing large sorts can exhaust RAM abruptly.
max_connections, new connection attempts are rejected with an error — not queued. Your application now throws exceptions for every request. The exceptions generate retries, which generate more connection attempts, which are also rejected. This positive-feedback loop can hold a service down for minutes even after the root cause is resolved. A connection pooler prevents this by queuing new requests at the pooler level instead of hammering the database.
The Pooler Tier: PgBouncer and ProxySQL
PgBouncer is the de-facto standard connection pooler for PostgreSQL. It is a single-threaded C process that sits between your application and PostgreSQL. PgBouncer maintains a small number of long-lived server connections and multiplexes thousands of short-lived client connections across them. It is lightweight enough to run on the same host as the database without meaningful CPU impact.
PgBouncer operates in three modes:
- Session pooling: A server connection is assigned to a client for the duration of the client session. This is the safest mode and compatible with all PostgreSQL features (including
SET,LISTEN, and prepared statements), but provides the least multiplexing benefit. - Transaction pooling: A server connection is assigned only for the duration of a transaction, then returned to the pool. This is the recommended production mode for most workloads — it allows hundreds of client connections to share a small server pool. Restrictions: prepared statements and advisory locks must be managed at the application layer, or you use PgBouncer 1.21+ with its server-side prepared-statement tracking.
- Statement pooling: A server connection is returned after each statement. Rarely used because it breaks multi-statement transactions.
ProxySQL is the equivalent for MySQL and MariaDB, but with a significantly broader feature set: query routing rules, read/write splitting, query rewriting, multiplexed connections, and rich observability via its internal stats tables. ProxySQL is commonly used in front of Galera clusters and MySQL Group Replication to route writes to the primary and reads to replicas transparently.
Configuring PgBouncer for Production
The minimal production PgBouncer configuration for a PostgreSQL primary. This file lives at /etc/pgbouncer/pgbouncer.ini:
echo "md5$(echo -n 'passwordusername' | md5sum | awk '{print $1}')". For SCRAM, use psql -c "SELECT rolpassword FROM pg_authid WHERE rolname = 'appuser';" on the database and copy the SCRAM-SHA-256$... string directly into userlist.txt. In production, use auth_query to have PgBouncer query the database for credentials directly, eliminating the sync problem entirely.
Configuring ProxySQL for MySQL
ProxySQL is configured via its admin interface (a MySQL-protocol connection on port 6032) rather than a config file, which makes it suitable for programmatic management and Terraform. The key tables are mysql_servers, mysql_users, and mysql_query_rules.
Pool Sizing: The Formula and the Reality
The most-cited pool sizing rule comes from the HikariCP documentation, which itself derives from database benchmarking research: pool size = (core count × 2) + effective spindle count. For a typical 8-core application server with SSD storage, that yields a pool of around 17 connections. This number shocks engineers who assume bigger is better.
The logic: a CPU can only execute one thread at a time per core. If you have more database connections than CPU cores on the database host, you are simply serializing work on the database side with more overhead (context switching, memory pressure) than fewer connections would incur. Saturating the database CPU is the failure mode, not the goal.
In practice, size your pool by starting with the formula and validating against the following signals in production:
pg_stat_activityactive count — how many connections are genuinely executing queries (state = 'active') vs idle?- PgBouncer
cl_waiting— clients waiting for a server connection. If this is non-zero, your pool is undersized for peak load. - Database CPU utilization — if it is under 60% at peak, you have headroom; if it is above 80%, adding more pool connections will make latency worse, not better.
- Query latency p99 — should not degrade as pool size increases. If it does, you are thrashing the database.
max_connections based on instance RAM using the formula LEAST({DBInstanceClassMemory/9531392}, 5000). A db.t3.micro (1 GB RAM) gets approximately 87 connections. With 3 application servers each holding a pool of 30, you have already used all of them. RDS Proxy (built on PgBouncer/ProxySQL architecture) exists precisely for this problem and is the recommended solution for Lambda or ECS workloads that open many short-lived connections.
Production Failure Modes and How to Detect Them
Three connection-related failure modes recur across production incidents at every scale:
1. Connection leaks: Application code acquires a connection and fails to release it (uncaught exception, missing finally block, ORM bug). Connections accumulate until the pool is exhausted. Detect with: SELECT count(*), state FROM pg_stat_activity GROUP BY state; — a growing idle count with no corresponding load is a leak.
2. Long-running idle-in-transaction: A connection starts a transaction, does some work, then hangs (waiting for user input, blocked on an external API call). The transaction holds locks. Other queries queue behind it. The pool looks "full" even though most connections are idle. Detect with:
3. Pool starvation under burst traffic: A sudden spike in request rate causes all pool connections to be in use simultaneously. New requests queue at the pooler. If the queue exceeds query_wait_timeout, PgBouncer returns an error to the client. This is correct behaviour — the pooler protects the database — but the application must handle this error gracefully (exponential backoff, circuit breaker) rather than crashing.
Connection management is not glamorous infrastructure. It rarely comes up in architecture reviews. But pool exhaustion is one of the most common causes of database-induced outages at scale — and a correctly tuned pooler in front of your database is one of the highest-leverage reliability improvements you can make with a single afternoon of work.