Back-of-the-Envelope Estimation
Back-of-the-Envelope Estimation
Before you draw a single architecture box, you need to know the scale of the problem you are solving. Back-of-the-envelope estimation is the skill of quickly calculating order-of-magnitude numbers — queries per second, storage needs, bandwidth, and memory — using simple arithmetic and a handful of memorised constants. In a system design interview, and in real engineering planning, these calculations set the constraints that drive every decision that follows.
The Reference Numbers You Must Memorise
Good estimators do not look numbers up mid-conversation. They keep a small table in their head:
- Latency landmarks: L1 cache ~0.5 ns | RAM read ~100 ns | SSD read ~100 µs | HDD seek ~10 ms | cross-datacenter RTT ~150 ms
- Throughput landmarks: SSD sequential ~500 MB/s | network (1 Gbps NIC) ~125 MB/s | typical DB row read ~1 µs of CPU
- Data sizes: ASCII char = 1 B | UUID = 36 B | average tweet ≈ 280 B | thumbnail ≈ 50 KB | HD photo ≈ 3 MB | 4-min MP3 ≈ 4 MB | 720p video minute ≈ 50 MB
- Powers of 2 / 10 equivalences: 210 ≈ 103 (1 KB ≈ 1,000 B), 220 ≈ 106 (1 MB), 230 ≈ 109 (1 GB), 240 ≈ 1012 (1 TB)
- Time conversions: 1 day ≈ 86,400 s ≈ 105 s | 1 month ≈ 2.5 × 106 s | 1 year ≈ 3 × 107 s
The Four Pillars of Estimation
Every back-of-the-envelope exercise produces four numbers. Let us walk through each one using a concrete example: a Twitter-like microblogging service with 300 million monthly active users (MAU), of whom 10% post once per day and 100% read.
1. Queries Per Second (QPS)
QPS is the heartbeat of your system. Everything — thread pools, connection limits, rate limiters, load balancer capacity — is sized against it.
Write QPS (new tweets):
Read QPS: Assume each active reader reads a timeline of 20 tweets, triggering 1 DB or cache read per tweet shown.
A single well-tuned relational database can handle ~10,000 simple reads per second. At 26,000 peak reads per second, we already know we need a read replica or a cache layer — and we have not even talked about the schema yet. That is the power of this calculation.
2. Storage
Storage estimation tells you what kind of data infrastructure you need and how fast you will grow.
That 10.8 TB fits on a single large NVMe array, but now add media: suppose 10% of tweets carry a 1 MB photo.
5.5 petabytes of photos in five years instantly tells us: we need an object store (S3, GCS, or similar), not a relational database BLOB column. Media storage is always separated from transactional storage at scale.
3. Bandwidth
Bandwidth governs your network infrastructure costs, CDN requirements, and whether a single region is enough.
2.2 GB/s average outbound means you need a CDN for media and likely multiple PoPs (Points of Presence) around the world. A single datacenter network card at 10 Gbps (~1.25 GB/s) would already be saturated — another design driver you discovered purely from arithmetic.
4. Memory (Cache Sizing)
Caches are effective only when the hot dataset fits in RAM. The question is: how much RAM do you actually need?
The classic 80/20 rule: 20% of the content generates 80% of the reads. Cache that 20%.
10–20 GB of Redis RAM can absorb the 26,000 peak read QPS almost entirely, leaving the database free for writes and cache-miss reads. That is the architectural insight the memory estimate delivers.
Putting It All Together — The Estimation Summary Diagram
A Visual Guide to Data Size Intuition
Common Mistakes and How to Avoid Them
- Forgetting the peak factor. Average QPS is rarely what breaks systems. Always multiply by 2–5× to model peak traffic (events, viral moments, midnight cron bursts).
- Ignoring replication overhead. If you write to a primary database that replicates to two replicas, your write amplification is 3×. Factor this into storage and I/O estimates.
- Mixing compression and uncompressed numbers. Be explicit: "3 TB/day uncompressed; with 3:1 compression ratio, we store ~1 TB/day." Never mix the two silently.
- Treating DAU = MAU. Typically DAU is 20–50% of MAU for consumer apps. Using MAU for daily calculations inflates every number.
- Not stating assumptions. In interviews and engineering docs, state every assumption explicitly — "I am assuming 10% of users post once per day" — so reviewers can challenge the inputs, not your arithmetic.
The Estimation in an Interview
In a 45-minute system design interview, spend 5–8 minutes on estimation. The goal is not to get a mathematically perfect answer — it is to demonstrate structured thinking. A good pattern:
- State the scale input: "With 300 M MAU and 10% posting daily…"
- Derive write QPS, then read QPS (with assumed read/write ratio).
- Size storage for 5 years. Note whether media separates into an object store.
- Sanity-check bandwidth. Note CDN requirement if outbound exceeds ~1 GB/s.
- Size the cache. State the hot-data percentage assumption.
- Summarise: "So we are looking at roughly 1K write QPS, 26K read QPS, ~10 TB text storage, ~5 PB media over 5 years, 2+ GB/s outbound, and a 10–20 GB cache."
That six-line summary already tells an experienced engineer what the architecture must include before you sketch a single box.