Databases & Storage

SQL vs NoSQL

18 min Lesson 1 of 10

SQL vs NoSQL

Every system design decision starts with a deceptively simple question: where do we store the data, and how do we shape it? The answer splits into two broad families — relational (SQL) databases and non-relational (NoSQL) databases. Neither is universally better. Each makes different trade-offs, and knowing those trade-offs is what separates engineers who pick the right tool from engineers who spend months fighting the wrong one.

What Is a Relational Database?

A relational database organises data into tables (rows and columns) with a strict, predefined schema. Relationships between tables are expressed through foreign keys, and you query the data with SQL (Structured Query Language). The defining guarantee is ACID — Atomicity, Consistency, Isolation, Durability — which means every transaction either fully commits or fully rolls back, and two concurrent transactions cannot corrupt each other's data.

Well-known relational databases: PostgreSQL, MySQL / MariaDB, Oracle, Microsoft SQL Server, SQLite.

Real-world numbers: A well-tuned PostgreSQL instance on a single server comfortably handles 10,000–50,000 transactions per second (TPS). LinkedIn, GitHub, and Instagram all started on MySQL/PostgreSQL and scaled to hundreds of millions of users before adding any NoSQL layer.

What Is a NoSQL Database?

NoSQL ("Not Only SQL") is an umbrella term for databases that abandon the strict table-and-schema model in favour of flexible data models optimised for specific access patterns. The four main sub-types are:

  • Document stores — data as JSON/BSON documents (MongoDB, Couchbase). Each document is self-contained; schema is optional and can vary per document.
  • Key-value stores — data as opaque blobs addressed by a key (Redis, DynamoDB in its simplest usage). Extremely fast; no query language beyond get/set/delete.
  • Wide-column stores — rows with dynamic, per-row columns grouped into column families (Cassandra, HBase). Optimised for massive write throughput across many nodes.
  • Graph databases — data as nodes and edges (Neo4j, Amazon Neptune). Optimised for traversing deep, complex relationships (social graphs, fraud detection).

Most NoSQL databases trade full ACID guarantees for horizontal scalability and flexible schemas. Many offer eventual consistency by default, though several (like MongoDB since v4 and DynamoDB Transactions) now support multi-document ACID transactions.

SQL vs NoSQL data model comparison Relational (SQL) Database Document (NoSQL) Database users id | name | email 1 | Alice | a@x.com 2 | Bob | b@x.com orders id | user_id | total 1 | 1 | $120 2 | 1 | $45 FK JOIN to combine data ✓ Fixed schema enforced ✓ ACID transactions ✓ Complex queries (JOIN) ✕ Hard to scale horizontally ✕ Schema changes are costly Document (user + orders embedded): { "_id": 1, "name": "Alice", "email": "a@x.com", "orders": [ {"id":1,"total":"$120"}, {"id":2,"total":"$45"} ] } No JOIN needed — fetch once ✓ Flexible / schema-less ✓ Scales horizontally easily ✕ Weaker consistency defaults
SQL stores data across normalised tables linked by foreign keys; NoSQL documents embed related data together, eliminating joins.

The Core Trade-offs Side by Side

Dimension SQL (Relational) NoSQL (Non-Relational)
Data model Tables, rows, columns Documents, key-value, columns, graphs
Schema Strict, predefined Flexible, per-document
Scaling Vertical (bigger machine) + limited horizontal via read replicas Horizontal (add nodes), built-in sharding
Consistency Strong (ACID) Often eventual (BASE), tunable
Query power Rich SQL — joins, aggregates, window functions Optimised for specific access patterns; multi-table joins are hard
Transactions Multi-row/table, always ACID Traditionally single-document; multi-doc ACID now in MongoDB, DynamoDB
Best for Financial records, ERPs, anything requiring correctness above all Product catalogs, user profiles, real-time feeds, IoT, recommendation engines

Choosing in Practice — Decision Signals

Use SQL when:

  • Your data has complex, many-to-many relationships that you need to query flexibly (orders, inventory, billing).
  • You need strong consistency and multi-table transactions — e.g. deducting a bank balance and recording a ledger entry atomically.
  • The schema is well-understood and stable — schema enforcement prevents bad data from sneaking in.
  • Your team already speaks SQL fluently and the dataset fits on one powerful server (or a primary + read replicas).

Use NoSQL when:

  • You need to scale writes horizontally beyond what a single relational primary can handle — e.g., Cassandra absorbing 1 million writes/sec across a 20-node cluster.
  • Your data structure varies per record — a product catalog where a TV has completely different attributes than a shirt.
  • You are storing time-series, log, or event data at very high velocity — relational tables with fixed columns become unwieldy.
  • Your primary access pattern is a single-key lookup or a document fetch by ID — NoSQL returns it in under 1 ms with no joins.
The CAP Theorem framing: Relational databases traditionally prioritise Consistency and Partition tolerance (CP), accepting that under a network partition they will refuse writes rather than return stale data. Many NoSQL databases prioritise Availability and Partition tolerance (AP), returning a possibly stale response rather than failing. Understanding this trade-off is fundamental to system design.
When to choose SQL vs NoSQL decision flow New Data Store? Complex relationships / joins? Yes Use SQL No Need strong consistency (ACID)? Yes Use SQL No Massive scale / flexible schema / single-key access? Yes NoSQL No Default: use SQL
A practical decision flow for choosing between SQL and NoSQL when designing a new data store.

Polyglot Persistence — The Real-World Answer

Large systems almost never use only one database type. At Uber, the trip data lives in MySQL/PostgreSQL for financial correctness, while the real-time geolocation of millions of drivers is stored in Redis (key-value) for sub-millisecond lookup. At Amazon, product metadata is in DynamoDB (document/key-value), order history is in Aurora (relational MySQL-compatible), and recommendation graphs use a graph store. This approach — using multiple database types, each for what it does best — is called polyglot persistence.

Default to SQL first. Relational databases are battle-tested, operationally simple, and give you powerful querying tools you will almost certainly need. Introduce a NoSQL store only when you have a clear, specific reason: a scale requirement you have measured, a data shape that genuinely does not fit tables, or an access pattern (e.g., time-series, graph traversal) where a specialised store dramatically outperforms SQL. Premature NoSQL adoption is one of the most common causes of unnecessary data consistency bugs in early-stage systems.
Do not choose NoSQL to avoid schema design. "Schema-less" does not mean "schema-free" — it means the schema is implicit and enforced in application code instead of the database. You still need to design your data model carefully; the difference is that the database will no longer catch inconsistencies for you. Teams that skip data modelling and reach for MongoDB "for flexibility" often end up with inconsistent documents and complex application-level validation logic.

Summary

  • SQL databases give you a rigid schema, powerful joins, and ACID guarantees. Scale them vertically and via read replicas. Best for transactional, relational, correctness-critical data.
  • NoSQL databases trade strict consistency for horizontal scalability and flexible data models. Best for specific, high-volume access patterns at scale.
  • In practice, mature systems use both — SQL as the backbone, NoSQL for specialised workloads.
  • The choice should be driven by your access patterns, consistency requirements, and expected scale — not by hype or familiarity alone.