Databases & Storage

Relational Databases & ACID

18 min Lesson 2 of 10

Relational Databases & ACID

Relational databases have been the backbone of production systems since the 1970s. PostgreSQL powers GitHub, Instagram, and Shopify. MySQL runs WordPress, YouTube (historically), and countless SaaS products. Understanding why they are built the way they are — tables, keys, relationships, and the ACID contract — is essential before you can make informed decisions about when to use them and when to reach for something else.

The Relational Model in Brief

A relation (table) is a set of tuples (rows) that all share the same schema (columns). Each column has a declared type and optional constraints. Every row in a properly designed table is uniquely identified by a primary key — a minimal set of columns whose values never repeat across rows.

Tables connect to each other through foreign keys: a column in one table that references the primary key of another. This enforces referential integrity — you cannot create an order for a customer that does not exist, and you cannot delete a customer who has orders (unless you cascade or set null).

Three related tables — one customer has many orders, one order has many items. Foreign keys enforce referential integrity.

Normalization: Why Avoid Redundancy

Normalization is the process of structuring tables to reduce data duplication and prevent update anomalies. The classic example: if you store the customer's email on every row of the orders table, an email change requires updating thousands of rows — and if one row is missed, your data is inconsistent. Keeping it in one place (the customers table) means one update, always correct.

The practical rules to remember: each non-key column should depend on the whole key, nothing but the key. In practice this means putting each entity in its own table and referencing it by ID.

Normalization vs. performance: Highly normalized schemas require JOIN operations to reassemble data. JOINs across large tables with proper indexes are fast, but they add complexity. Many high-scale systems selectively denormalize hot query paths — a trade-off covered in Lesson 7.

Transactions: The Unit of Work

A transaction is a sequence of SQL statements that the database treats as a single logical operation. The canonical example is a bank transfer: debit account A, credit account B. Both must succeed, or neither should persist. You never want a state where A is debited but B is not credited.

BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 42;
  UPDATE accounts SET balance = balance + 500 WHERE id = 99;
COMMIT;

-- If anything fails before COMMIT, issue:
ROLLBACK;  -- undoes both updates

The database guarantees that either the commit lands atomically, or the rollback leaves both rows completely unchanged. This is the foundation of the ACID contract.

ACID — The Four Guarantees

ACID is an acronym for four properties that every relational transaction must satisfy. Each property solves a concrete failure mode.

The four ACID guarantees, their real-world meaning, and the mechanism each database uses to implement them.

A Closer Look at Each Property

Atomicity — The database uses a Write-Ahead Log (WAL). Before any data page is changed on disk, the intended change is written to the WAL. If the server crashes mid-transaction, recovery replays completed transactions and rolls back incomplete ones. From the application's perspective, a transaction either fully happened or never happened at all.

Consistency — This is partially the database's job (enforcing constraints) and partially the application's job (not violating business rules). PostgreSQL checks NOT NULL, UNIQUE, CHECK, and FOREIGN KEY constraints at commit time. If any check fails, the whole transaction is rolled back automatically.

Isolation — Isolation is the most nuanced property. The SQL standard defines four isolation levels that trade off correctness against concurrency:

Read Uncommitted — can see dirty (uncommitted) writes from other transactions. Almost never used.
Read Committed — sees only committed data. The default in PostgreSQL and most systems. Prevents dirty reads but allows non-repeatable reads (re-reading a row within the same transaction may yield different values).
Repeatable Read — a snapshot of the data is taken at the start of the transaction; re-reads are stable. Prevents dirty and non-repeatable reads. The default in MySQL/InnoDB.
Serializable — transactions execute as if run one after the other. Highest correctness, highest overhead. Use for financial ledgers, inventory deduction, or anywhere "phantom reads" would cause bugs.

Practical default: Start with Read Committed for most web API workloads. Step up to Repeatable Read or Serializable only for transactions that aggregate data and then write decisions based on it — such as "check seat count, then book seat".

Most modern databases implement isolation through MVCC (Multi-Version Concurrency Control): each writer creates a new version of the row rather than locking the row in-place. Readers see the version current as of their snapshot timestamp. This allows high read concurrency without blocking writes.

Durability — A COMMIT does not return success until the WAL record is flushed to durable storage (fsync to disk or SSD). This is why commits on spinning disks can take milliseconds. Cloud databases (Aurora, Cloud SQL) replicate WAL entries across multiple availability zones — the commit returns only after a quorum of nodes acknowledge the write, giving durability that survives an entire datacenter failure.

Where ACID Costs You: Performance Trade-offs

ACID guarantees are not free. Each property has a price:

Atomicity & Durability — Every commit requires an fsync. On a single node, PostgreSQL can sustain roughly 1,000–5,000 write transactions per second before the WAL becomes the bottleneck. Using group commit (batching multiple transactions into one fsync) raises this to tens of thousands.
Isolation — Higher isolation levels require more locking or more MVCC versions, increasing memory pressure and the chance of deadlocks. Serializable isolation on a busy OLTP database can cut throughput by 30–50 % compared to Read Committed.
Consistency — Foreign key constraints require an index lookup on every INSERT or DELETE. Removing foreign keys (a common NoSQL migration pattern) removes this overhead — but also removes the safety net.

Do not disable durability for throughput — some tools allow setting synchronous_commit = off in PostgreSQL, which means COMMIT returns before the WAL is flushed. This improves write latency but introduces a risk window of up to ~200 ms of committed data loss on crash. This is a valid trade-off only for data you can afford to regenerate (analytics events, log ingestion) — never for financial data or user records.

Choosing a Relational Database for Scale

At small scale (millions of rows, a few hundred QPS) any properly indexed relational database is fast. As you scale:

Read scaling — add read replicas (covered in Lesson 5). Reads fan out; writes still go to one primary.
Write scaling — vertical scaling (bigger machine) buys time. Beyond that, partitioning/sharding (Lesson 6) or moving certain workloads to purpose-built stores.
Connection pooling — PostgreSQL spawns a process per connection; at 10,000 connections memory use alone is ~10 GB. PgBouncer (transaction-mode pooling) lets thousands of app threads share dozens of actual DB connections.

Rule of thumb: a well-tuned single PostgreSQL instance on modern hardware (32-core, NVMe SSD) handles hundreds of thousands of reads per second and tens of thousands of transactional writes per second. Most applications never outgrow a single primary. Add replicas before you consider NoSQL.

Relational databases are not a legacy technology — they are the right default for structured, relational data where correctness matters. The ACID contract is what lets you trust the numbers you read back from the database, which is exactly the foundation every other design decision in this course is built on.