Real-World System Design Case Studies

Design a URL Shortener

18 min Lesson 2 of 10

Design a URL Shortener

A URL shortener converts a long URL like https://www.example.com/products/shoes/mens/running?color=blue&size=10&ref=homepage_banner into something like https://tinyurl.com/ab3x9k. Deceptively simple on the surface, it is an excellent interview problem because it touches encoding, storage, caching, redirection, and scale all at once.

Step 1 — Clarify the Requirements

Before designing anything, pin down the numbers:

  • Write throughput: 100 million new URLs shortened per day (~1,160 writes/second).
  • Read throughput: 10:1 read/write ratio → ~11,600 redirects/second.
  • URL lifetime: 10 years by default, optional expiry.
  • Short code length: 6–8 characters, globally unique.
  • Storage estimate: 100 M URLs/day × 365 × 10 years = 365 billion rows. With ~500 bytes per row that is ~180 TB over 10 years.
Always state your assumptions out loud in an interview. The interviewer may immediately halve or double the scale, and that changes your entire architecture.

Step 2 — Pick a Short Code Strategy

The core question: how do you generate a short, unique, collision-free code?

Option A — Base-62 Encoding of a Counter

Maintain a global auto-increment counter. Encode the counter value in base-62 (digits 0–9, letters a–z, A–Z). A 7-character base-62 string supports 627 ≈ 3.5 trillion unique URLs — more than enough for 10 years.

Problem: a single counter is a bottleneck. You need a dedicated counter service (e.g., a Redis INCR or a Twitter Snowflake-style distributed counter). If the counter node fails, writes stall.

Option B — MD5 / SHA-256 + Truncate

Hash the long URL and take the first 7 characters. Very simple but has non-zero collision probability. Two different long URLs can produce the same 7-char prefix. Collisions require re-hashing with a salt or an extra uniqueness check.

Option C — Random Base-62 + DB Uniqueness Check

Generate a random 7-char base-62 token, attempt an INSERT. If it conflicts, retry. Collision probability at 365 B rows is still <0.01 %, acceptable in practice.

Recommended approach (real systems): Use a distributed ID generator (Snowflake, ULIDv2, or a range-based counter service) that hands out monotonically increasing ranges to each app server. Each server burns through its range locally with zero cross-server coordination. Encode the ID in base-62. Fast, collision-free, and horizontally scalable.

Step 3 — Database Schema

-- Core table (MySQL / PlanetScale / Vitess) CREATE TABLE urls ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, short_code CHAR(8) NOT NULL, long_url TEXT NOT NULL, user_id BIGINT NULL, -- optional auth created_at DATETIME NOT NULL, expires_at DATETIME NULL, click_count BIGINT DEFAULT 0, UNIQUE KEY uk_short_code (short_code), INDEX ix_created (created_at) );

The UNIQUE KEY on short_code enforces uniqueness and makes the redirect lookup O(log n). At 365 B rows you will partition or shard this table — more on that in a moment.

Step 4 — The Redirect Flow

A redirect is a read-heavy, latency-sensitive operation. The naive approach — hit the database on every click — does not scale to 11,600 req/s with sub-10 ms SLAs.

URL Shortener Request Flow Client Browser / App Load Balancer Nginx / ALB App Server Redirect Logic App Server Redirect Logic Cache Redis Cluster Primary DB MySQL / Vitess cache miss lookup GET /ab3x9k 301 / 302 HIT (~1 ms) MISS fallback Async Click Counter App fires event → Kafka/SQS → Consumer increments DB click_count
Redirect request flow: cache-first lookup with async click counting.

Step 5 — 301 vs 302 Redirect

This is a subtle but important decision:

  • 301 Permanent: the browser caches the mapping forever. Subsequent clicks skip your servers entirely — zero analytics, but maximum performance.
  • 302 Temporary: the browser always asks your server again. You get accurate click analytics at the cost of every redirect hitting your stack.

Most commercial shorteners (Bitly, TinyURL) default to 302 so they can track clicks. If analytics are not needed, 301 dramatically reduces server load.

Step 6 — Scaling Reads with Cache

URL redirects are read-heavy with high temporal locality — a viral tweet might generate millions of hits on one short code in minutes. A Redis cluster with a write-through / read-aside strategy absorbs this spike:

  • On write: store short_code → long_url in Redis with TTL = URL expiry.
  • On redirect: check Redis first. On a hit return immediately (sub-millisecond). On a miss, query the DB and populate Redis.
  • Cache hit rate for popular URLs exceeds 99 %. Only cold or rare URLs touch the DB.

Step 7 — Scaling Writes with Sharding

At 365 billion rows over 10 years, a single MySQL instance is impractical. Common strategies:

  • Consistent hash sharding on short_code: route each code to one shard deterministically. Simple, but re-sharding is painful.
  • Range-based sharding on ID: IDs 0–1 B on shard 1, 1 B–2 B on shard 2, etc. Easy to reason about, but hot shards appear if recent IDs are accessed most.
  • Vitess / PlanetScale: MySQL-compatible with built-in transparent sharding — the recommended production choice.
URL Shortener Write Path with Counter Service and DB Shards Client POST long_url App Server Shorten API Counter Service Redis INCR / range Base-62 Encoder ID → short_code DB Shard 1 IDs 0 – 1 B DB Shard 2 IDs 1 B – 2 B DB Shard 3 IDs 2 B+ Response short_code get next ID INSERT
Write path: distributed counter assigns a unique ID, Base-62 encoding produces the short code, the record is inserted into the appropriate DB shard.

Step 8 — Custom Aliases and Expiry

Premium users often want tinyurl.com/my-brand. Custom aliases go through the same uniqueness check but skip the counter — you just try to INSERT the user-supplied code and return a conflict error if it is taken. For expiry, a background job scans for rows where expires_at < NOW() and either hard-deletes them or marks them inactive. Expired codes can then be recycled after a grace period.

Step 9 — Analytics

Counting clicks synchronously on every redirect would add database write load to your critical path. Instead, fire an event to a message queue (Kafka, SQS) and let an async consumer batch-update the click_count column. For richer analytics (geo, device, referrer), write each event to a columnar store (BigQuery, ClickHouse) — never to your operational MySQL shards.

Do not write click analytics synchronously. At 11,600 redirects/second, a synchronous DB increment becomes 11,600 writes/second on the same table your reads depend on. This is the most common mistake in URL shortener designs.

Summary: Key Design Decisions

  • Short code generation: distributed counter + base-62 encoding — collision-free and horizontally scalable.
  • Redirect: 302 for analytics, cache-first (Redis) to protect the DB.
  • Storage: sharded MySQL (or Vitess) for 365 B rows over 10 years.
  • Analytics: async via message queue — never on the redirect critical path.
  • Expiry: background worker, not in-line with the redirect.