Design a URL Shortener
Design a URL Shortener
A URL shortener converts a long URL like https://www.example.com/products/shoes/mens/running?color=blue&size=10&ref=homepage_banner into something like https://tinyurl.com/ab3x9k. Deceptively simple on the surface, it is an excellent interview problem because it touches encoding, storage, caching, redirection, and scale all at once.
Step 1 — Clarify the Requirements
Before designing anything, pin down the numbers:
- Write throughput: 100 million new URLs shortened per day (~1,160 writes/second).
- Read throughput: 10:1 read/write ratio → ~11,600 redirects/second.
- URL lifetime: 10 years by default, optional expiry.
- Short code length: 6–8 characters, globally unique.
- Storage estimate: 100 M URLs/day × 365 × 10 years = 365 billion rows. With ~500 bytes per row that is ~180 TB over 10 years.
Step 2 — Pick a Short Code Strategy
The core question: how do you generate a short, unique, collision-free code?
Option A — Base-62 Encoding of a Counter
Maintain a global auto-increment counter. Encode the counter value in base-62 (digits 0–9, letters a–z, A–Z). A 7-character base-62 string supports 627 ≈ 3.5 trillion unique URLs — more than enough for 10 years.
Problem: a single counter is a bottleneck. You need a dedicated counter service (e.g., a Redis INCR or a Twitter Snowflake-style distributed counter). If the counter node fails, writes stall.
Option B — MD5 / SHA-256 + Truncate
Hash the long URL and take the first 7 characters. Very simple but has non-zero collision probability. Two different long URLs can produce the same 7-char prefix. Collisions require re-hashing with a salt or an extra uniqueness check.
Option C — Random Base-62 + DB Uniqueness Check
Generate a random 7-char base-62 token, attempt an INSERT. If it conflicts, retry. Collision probability at 365 B rows is still <0.01 %, acceptable in practice.
Step 3 — Database Schema
The UNIQUE KEY on short_code enforces uniqueness and makes the redirect lookup O(log n). At 365 B rows you will partition or shard this table — more on that in a moment.
Step 4 — The Redirect Flow
A redirect is a read-heavy, latency-sensitive operation. The naive approach — hit the database on every click — does not scale to 11,600 req/s with sub-10 ms SLAs.
Step 5 — 301 vs 302 Redirect
This is a subtle but important decision:
- 301 Permanent: the browser caches the mapping forever. Subsequent clicks skip your servers entirely — zero analytics, but maximum performance.
- 302 Temporary: the browser always asks your server again. You get accurate click analytics at the cost of every redirect hitting your stack.
Most commercial shorteners (Bitly, TinyURL) default to 302 so they can track clicks. If analytics are not needed, 301 dramatically reduces server load.
Step 6 — Scaling Reads with Cache
URL redirects are read-heavy with high temporal locality — a viral tweet might generate millions of hits on one short code in minutes. A Redis cluster with a write-through / read-aside strategy absorbs this spike:
- On write: store
short_code → long_urlin Redis with TTL = URL expiry. - On redirect: check Redis first. On a hit return immediately (sub-millisecond). On a miss, query the DB and populate Redis.
- Cache hit rate for popular URLs exceeds 99 %. Only cold or rare URLs touch the DB.
Step 7 — Scaling Writes with Sharding
At 365 billion rows over 10 years, a single MySQL instance is impractical. Common strategies:
- Consistent hash sharding on
short_code: route each code to one shard deterministically. Simple, but re-sharding is painful. - Range-based sharding on ID: IDs 0–1 B on shard 1, 1 B–2 B on shard 2, etc. Easy to reason about, but hot shards appear if recent IDs are accessed most.
- Vitess / PlanetScale: MySQL-compatible with built-in transparent sharding — the recommended production choice.
Step 8 — Custom Aliases and Expiry
Premium users often want tinyurl.com/my-brand. Custom aliases go through the same uniqueness check but skip the counter — you just try to INSERT the user-supplied code and return a conflict error if it is taken. For expiry, a background job scans for rows where expires_at < NOW() and either hard-deletes them or marks them inactive. Expired codes can then be recycled after a grace period.
Step 9 — Analytics
Counting clicks synchronously on every redirect would add database write load to your critical path. Instead, fire an event to a message queue (Kafka, SQS) and let an async consumer batch-update the click_count column. For richer analytics (geo, device, referrer), write each event to a columnar store (BigQuery, ClickHouse) — never to your operational MySQL shards.
Summary: Key Design Decisions
- Short code generation: distributed counter + base-62 encoding — collision-free and horizontally scalable.
- Redirect: 302 for analytics, cache-first (Redis) to protect the DB.
- Storage: sharded MySQL (or Vitess) for 365 B rows over 10 years.
- Analytics: async via message queue — never on the redirect critical path.
- Expiry: background worker, not in-line with the redirect.