SQL vs NoSQL
SQL vs NoSQL
Every system design decision starts with a deceptively simple question: where do we store the data, and how do we shape it? The answer splits into two broad families — relational (SQL) databases and non-relational (NoSQL) databases. Neither is universally better. Each makes different trade-offs, and knowing those trade-offs is what separates engineers who pick the right tool from engineers who spend months fighting the wrong one.
What Is a Relational Database?
A relational database organises data into tables (rows and columns) with a strict, predefined schema. Relationships between tables are expressed through foreign keys, and you query the data with SQL (Structured Query Language). The defining guarantee is ACID — Atomicity, Consistency, Isolation, Durability — which means every transaction either fully commits or fully rolls back, and two concurrent transactions cannot corrupt each other's data.
Well-known relational databases: PostgreSQL, MySQL / MariaDB, Oracle, Microsoft SQL Server, SQLite.
Real-world numbers: A well-tuned PostgreSQL instance on a single server comfortably handles 10,000–50,000 transactions per second (TPS). LinkedIn, GitHub, and Instagram all started on MySQL/PostgreSQL and scaled to hundreds of millions of users before adding any NoSQL layer.
What Is a NoSQL Database?
NoSQL ("Not Only SQL") is an umbrella term for databases that abandon the strict table-and-schema model in favour of flexible data models optimised for specific access patterns. The four main sub-types are:
- Document stores — data as JSON/BSON documents (MongoDB, Couchbase). Each document is self-contained; schema is optional and can vary per document.
- Key-value stores — data as opaque blobs addressed by a key (Redis, DynamoDB in its simplest usage). Extremely fast; no query language beyond get/set/delete.
- Wide-column stores — rows with dynamic, per-row columns grouped into column families (Cassandra, HBase). Optimised for massive write throughput across many nodes.
- Graph databases — data as nodes and edges (Neo4j, Amazon Neptune). Optimised for traversing deep, complex relationships (social graphs, fraud detection).
Most NoSQL databases trade full ACID guarantees for horizontal scalability and flexible schemas. Many offer eventual consistency by default, though several (like MongoDB since v4 and DynamoDB Transactions) now support multi-document ACID transactions.
The Core Trade-offs Side by Side
| Dimension | SQL (Relational) | NoSQL (Non-Relational) |
|---|---|---|
| Data model | Tables, rows, columns | Documents, key-value, columns, graphs |
| Schema | Strict, predefined | Flexible, per-document |
| Scaling | Vertical (bigger machine) + limited horizontal via read replicas | Horizontal (add nodes), built-in sharding |
| Consistency | Strong (ACID) | Often eventual (BASE), tunable |
| Query power | Rich SQL — joins, aggregates, window functions | Optimised for specific access patterns; multi-table joins are hard |
| Transactions | Multi-row/table, always ACID | Traditionally single-document; multi-doc ACID now in MongoDB, DynamoDB |
| Best for | Financial records, ERPs, anything requiring correctness above all | Product catalogs, user profiles, real-time feeds, IoT, recommendation engines |
Choosing in Practice — Decision Signals
Use SQL when:
- Your data has complex, many-to-many relationships that you need to query flexibly (orders, inventory, billing).
- You need strong consistency and multi-table transactions — e.g. deducting a bank balance and recording a ledger entry atomically.
- The schema is well-understood and stable — schema enforcement prevents bad data from sneaking in.
- Your team already speaks SQL fluently and the dataset fits on one powerful server (or a primary + read replicas).
Use NoSQL when:
- You need to scale writes horizontally beyond what a single relational primary can handle — e.g., Cassandra absorbing 1 million writes/sec across a 20-node cluster.
- Your data structure varies per record — a product catalog where a TV has completely different attributes than a shirt.
- You are storing time-series, log, or event data at very high velocity — relational tables with fixed columns become unwieldy.
- Your primary access pattern is a single-key lookup or a document fetch by ID — NoSQL returns it in under 1 ms with no joins.
Polyglot Persistence — The Real-World Answer
Large systems almost never use only one database type. At Uber, the trip data lives in MySQL/PostgreSQL for financial correctness, while the real-time geolocation of millions of drivers is stored in Redis (key-value) for sub-millisecond lookup. At Amazon, product metadata is in DynamoDB (document/key-value), order history is in Aurora (relational MySQL-compatible), and recommendation graphs use a graph store. This approach — using multiple database types, each for what it does best — is called polyglot persistence.
Summary
- SQL databases give you a rigid schema, powerful joins, and ACID guarantees. Scale them vertically and via read replicas. Best for transactional, relational, correctness-critical data.
- NoSQL databases trade strict consistency for horizontal scalability and flexible data models. Best for specific, high-volume access patterns at scale.
- In practice, mature systems use both — SQL as the backbone, NoSQL for specialised workloads.
- The choice should be driven by your access patterns, consistency requirements, and expected scale — not by hype or familiarity alone.