Real-World System Design Case Studies

Capstone: Design a System End-to-End

18 min Lesson 10 of 10

Capstone: Design a System End-to-End

This capstone lesson ties every concept from the course into a single, realistic design exercise. We will design a social video platform — think YouTube at reduced but still serious scale — by running the full five-phase framework from scratch: clarify, estimate, API, high-level architecture, and deep-dive trade-offs. Every decision is explained with why, not just what.

By the end you will have one complete reference design you can adapt to any system design interview or greenfield project.

Phase 1 — Clarify Requirements

Functional requirements (the must-haves):

Users can upload videos (up to 4 GB, any common format).
Videos are transcoded to multiple resolutions (360p, 720p, 1080p) and served adaptively.
Users can search videos by title and tag.
A personalised home feed shows recommended videos.
Users can like, comment, and subscribe to channels.
View counts are eventually consistent — real-time accuracy is not required.

Non-functional requirements (these drive the architecture):

Scale: 50 million daily active users (DAU); 500,000 video uploads per day; 5 billion video view-seconds per day.
Latency: Video playback must start within 2 seconds globally (p95).
Availability: 99.95 % uptime for the playback path; upload can tolerate brief degradation.
Durability: Uploaded source videos must never be lost (no data loss SLA).
Consistency: Like counts and view counts are eventually consistent. Subscriptions are strongly consistent (you must never miss a sub notification).

Key idea: Separating the upload/transcode path from the playback path lets you apply different reliability and latency budgets to each. Playback is the revenue-critical hot path; upload is a background task users tolerate minor delays on.

Phase 2 — Estimate Scale

Upload QPS: 500,000 uploads/day ÷ 86,400 s ≈ 6 uploads/second average; 30/s peak (5x factor).
Playback QPS: 5 B view-seconds/day ÷ average video length 300 s = 16.7 M concurrent streams; divided by 86,400 s ≈ 193 k view starts/second peak.
Storage — raw uploads: 500,000 × avg 500 MB = 250 TB/day of raw source video. At 3 resolutions each ≈ 0.4× the original, transcode output adds ~300 TB/day. Over 1 year: ~200 PB. Object storage (S3 / GCS) is the only practical answer.
Bandwidth — egress: 193 k streams × 2 Mbps avg bitrate = ~386 Gbps. A CDN is not optional — it is a hard architectural requirement at this scale.
Metadata DB: 500,000 new video rows/day × 2 KB per row = 1 GB/day of metadata. After 5 years ≈ 1.8 TB — easily fits in a sharded relational DB or a distributed document store.

Phase 3 — Core API Surface

POST   /v1/videos/initiate-upload
  body: { title, description, tags[], duration_s, file_size_bytes }
  response: { video_id, upload_url }          -- pre-signed S3 URL

PUT    <upload_url>                           -- direct client-to-S3 upload (bypasses app servers)

GET    /v1/videos/{video_id}
  response: { video_id, title, manifest_url, like_count, view_count, channel }

GET    /v1/feed?user_id=X&cursor=Y
  response: { videos[], next_cursor }

POST   /v1/videos/{video_id}/likes
  body: { user_id }
  response: { new_like_count }

GET    /v1/search?q=term&page=N
  response: { videos[], total_hits }

Handing the client a pre-signed URL for direct upload to object storage is a critical design decision: it removes the app servers entirely from the upload data path. The app server only handles metadata; the raw bytes go straight from the client to S3.

Phase 4 — High-Level Architecture

The system decomposes naturally into two independent planes:

Write plane (upload): Client → App Server (metadata) + direct S3 upload → S3 event → Transcode Queue → Transcode Workers → Transcoded videos back to S3 → CDN invalidation.
Read plane (playback): Client → CDN → Origin (for cache misses only) → App Server → Metadata DB / Cache → HLS manifest → CDN segments.

Full end-to-end architecture: the upload plane (top) is fully decoupled from the playback plane (bottom). The CDN absorbs the vast majority of playback traffic; app servers only handle cache misses and metadata lookups.

Phase 5 — Deep-Dive: The Five Hardest Problems

Every large system has a small number of genuinely hard sub-problems. For our video platform they are: transcode pipeline reliability, CDN cache hit rate, view-count accuracy vs performance, feed generation at scale, and search freshness. We tackle each briefly.

1. Transcode Pipeline Reliability

An S3 event triggers a message on a Kafka topic. A pool of stateless transcode workers consumes jobs. Each worker downloads the raw source, runs FFmpeg to produce HLS segments at three resolutions, and uploads them back to S3. The worker then publishes a transcode.completed event that the metadata service consumes to flip the video status from processing to published.

Key trade-off: at-least-once delivery (Kafka default) means a worker can crash mid-job and the message is redelivered. Workers must be idempotent — re-transcoding and overwriting S3 output is safe because the final write is atomic from S3's perspective.

2. CDN Cache Hit Rate

Video segments are immutable once transcoded (a 2-second HLS chunk never changes). Set a Cache-Control: max-age=31536000, immutable header. The CDN will cache them for a year at every edge node. Hit rates above 90 % are achievable for popular content. For the long tail (videos with few views), cache misses are acceptable — the origin (S3) can handle the load since it is object storage, not a database.

Best practice: Version your manifest URLs (/v/{video_id}/{quality}/manifest.m3u8?gen=3) so that re-transcodes do not serve stale manifests. The manifest itself should have a short TTL (60 s) while the segments have a year-long TTL.

3. View Count — Eventual Consistency Done Right

Incrementing a SQL counter on every view at 193 k/s would kill a relational database. The classic solution is a two-stage pipeline:

Each app server buffers view events locally and flushes them in batches every 10 seconds to a stream (Kafka).
A Flink or Spark Streaming job aggregates counts from the stream and writes periodic snapshots back to the database (every 30 seconds).

The user sees a count that is at most 40 seconds stale. For a social video platform, that is perfectly acceptable. The database never sees more than a few hundred writes per second regardless of view volume.

4. Feed Generation — Fan-out on Write vs Fan-out on Read

Fan-out on write (push) gives fast reads but is expensive for high-follower creators. Fan-out on read (pull) is cheap to write but slow to read. Real systems use a hybrid: push for normal users, pull for celebrities.

5. Search Freshness

When a video is published, its metadata must appear in search results within a few seconds. The metadata service publishes a video.published event; a separate indexing worker consumes it and calls the Elasticsearch bulk API to index title, tags, description, and a pre-computed quality score (views, likes, age). Elasticsearch handles inverted-index updates within ~1 second, meeting the freshness requirement without touching the metadata DB from the search path.

Putting It All Together — Key Design Decisions Table

Synthesis: Every decision below is a trade-off, not an absolute truth. The right answer depends on your constraints, and being able to articulate that is what separates a senior engineer from a junior one.

Decision	Choice	Why
Video storage	Object store (S3)	Unlimited scale, 11-nine durability, cheap at petabyte scale
Upload path	Pre-signed URL (client → S3 direct)	Removes app servers from the data path; scales upload bandwidth independently
Transcode coordination	Kafka queue + stateless workers	Decouples upload from transcode; workers are horizontally scalable; at-least-once delivery with idempotent jobs
Video delivery	CDN + HLS (Adaptive Bitrate)	Global low-latency; client adapts quality to network; immutable segments cache forever
Metadata storage	Sharded MySQL + Redis cache	ACID for writes; cache absorbs read QPS (video metadata is read-heavy, write-light)
View counts	Buffered batch writes via Kafka	Avoids hot-row contention; eventual consistency is acceptable for counters
Feed generation	Hybrid push/pull	Push for normal users (fast reads); pull for celebrities (avoids fan-out storms)
Search	Elasticsearch (event-driven indexing)	Full-text and faceted search; decoupled from DB; ~1 s indexing latency is acceptable

Common interview pitfall: Over-engineering from day one. Do not add a service mesh, a sidecar proxy, a GraphQL gateway, and a CQRS event store before you have justified the need. Every layer adds latency, operational burden, and failure surface. Add complexity only when you can point to the specific bottleneck or requirement that forces it.

What You Have Now

You have walked a complete system design — from a vague prompt to a production-credible architecture — using the same five-phase methodology you can apply to any problem. The patterns that appeared here (pre-signed uploads, CDN-first delivery, async transcode pipelines, hybrid fan-out, event-driven search indexing, eventual-consistent counters) are the same patterns that power YouTube, Netflix, TikTok, and every other large-scale video platform in production today.

Take this reference design, adapt the constraints, swap the components for the ones your problem demands, and you will have a rigorous, trade-off-driven answer every time.

Previous Lesson Design a Distributed Key-Value Store

Back to Course System Design

Back to Course

Capstone: Design a System End-to-End

Capstone: Design a System End-to-End

Phase 1 — Clarify Requirements

Phase 2 — Estimate Scale

Phase 3 — Core API Surface

Phase 4 — High-Level Architecture

Phase 5 — Deep-Dive: The Five Hardest Problems

1. Transcode Pipeline Reliability

2. CDN Cache Hit Rate

3. View Count — Eventual Consistency Done Right

4. Feed Generation — Fan-out on Write vs Fan-out on Read

5. Search Freshness

Putting It All Together — Key Design Decisions Table

What You Have Now

Tutorial Complete!