Real-World System Design Case Studies

Design a Video Streaming Service

18 min Lesson 6 of 10

Design a Video Streaming Service

YouTube serves over 500 hours of video uploaded every minute and delivers more than 1 billion hours of watch time per day across 100+ countries. Designing a system at this scale forces you to solve three distinct, hard problems simultaneously: ingestion (how do raw uploads become streamable files?), storage (where do petabytes of video live?), and delivery (how does every viewer on every device get smooth playback?). Each problem has its own architecture story, and together they compose one of the most instructive distributed-systems case studies you can study.

Requirements

Functional requirements:

Users can upload videos (up to 10 GB each).
Uploaded videos are transcoded into multiple resolutions (360p, 720p, 1080p, 4K) and formats (MP4/H.264, WebM/VP9, HLS segments).
Users can stream videos with adaptive bitrate — quality adjusts automatically to available bandwidth.
Users can search, like, comment, and subscribe (out of scope for this lesson; we focus on upload and streaming).

Non-functional requirements:

Availability: 99.99 % — even a 0.01 % outage affects millions of concurrent viewers.
Latency: Video playback must start within 2 seconds; upload acknowledgment within 500 ms.
Throughput: 500 hours/minute upload; peak egress bandwidth in the tens of terabits per second globally.
Durability: Uploaded video must never be lost (3+ geo-redundant copies).
Scalability: Must scale horizontally for both ingestion and delivery without re-architecting.

Scale Estimation

Storage: 500 hours × 60 min × ~1 GB/min raw ≈ 30 TB raw video per hour. After transcoding into 5 renditions, storage cost per raw GB is roughly 2–3×. At 30 TB/hour, that is ~70–90 TB of final storage per hour, or ~650 PB per year — why YouTube needs to own its own data centres and negotiate aggressive cloud storage deals.
Bandwidth (egress): 1 billion watch-hours/day ÷ 86,400 seconds ≈ ~11.6 million concurrent streams. At 2 Mbps average bitrate: ~23 Tbps total egress. No single CDN POP can serve this; it requires a global, multi-tier CDN hierarchy.
Transcoding workers: Processing 1 hour of 4K video takes ~20–40 minutes of CPU time per rendition. 500 hours/minute of upload × 5 renditions × 30 min/job = 75,000 CPU-minutes of transcoding per minute of real time — a massively parallel compute problem.

Key insight: Storage is a one-time cost per video, but bandwidth is recurring — every view of every video consumes egress. CDN costs, not storage costs, dominate YouTube's infrastructure bill. Every engineering decision about encoding efficiency, adaptive bitrate, and CDN placement is ultimately about reducing that egress bill.

High-Level Architecture

The system has two distinct planes: the upload and processing pipeline (write-heavy, async, latency-tolerant) and the playback pipeline (read-heavy, synchronous, latency-critical). Never mix them — processing load must not starve viewers.

Upload/processing pipeline (top) vs. read/playback pipeline (bottom) — separating these planes prevents transcoding spikes from degrading viewer experience.

Deep Dive 1 — The Upload Pipeline

A raw video upload can be gigabytes. Sending it as a single HTTP POST is fragile — a 30-second network hiccup fails the whole upload. Instead, the client chunks the file (typically 5–20 MB pieces) and uploads each chunk independently. The server reassembles them. This enables resumable uploads: if the connection drops at chunk 47, the client resumes from chunk 48.

YouTube uses the GCS Resumable Upload protocol. AWS offers S3 Multipart Upload. The pattern is always: (1) initiate upload → get an upload ID, (2) upload parts with part numbers, (3) complete upload → storage stitches the parts.

Best practice: Store the raw file first, then publish a job to a message queue. Never block the upload response on transcoding — it takes minutes. The creator gets instant acknowledgment; a worker picks up the job asynchronously. This decoupling is the single most important design decision in the upload pipeline.

Deep Dive 2 — Transcoding at Scale

One uploaded video must become many files — different resolutions (360p, 480p, 720p, 1080p, 2160p) and different codecs (H.264 for broadest compatibility; VP9/AV1 for ~30–50% better compression). Multiplied across 500 upload-hours per minute, this is a massively parallel compute problem.

YouTube built Transcoder, a proprietary system that breaks each video into segments of a few seconds, distributes segments across thousands of machines in parallel, and re-assembles the results. The key architectural idea: you can parallelize transcoding along the time axis — segment 1 can transcode on machine A while segment 2 transcodes on machine B. This shrinks a 2-hour video's transcode time from ~hours to ~minutes.

At the industry level this pattern maps to: a Directed Acyclic Graph (DAG) of tasks where each node is a transformation (split → encode → thumbnail → merge → publish). A workflow engine (Apache Airflow, AWS Step Functions, or a custom DAG runner) orchestrates the DAG.

Transcoding DAG: the splitter fans out segments to parallel encoder workers; the merger reassembles them; the HLS packager writes adaptive-bitrate manifests; the CDN origin receives the final assets.

Deep Dive 3 — Storage Architecture

Video bytes live in an object store (Amazon S3, Google Cloud Storage, or YouTube's homegrown Colossus file system). Object stores are ideal because:

Files are immutable once written — no update conflicts, no locking.
They scale to exabytes with no schema migrations.
They have a flat key-value interface: bucket/videoId/720p/segment_042.ts.
They support lifecycle policies: move infrequently accessed videos to cheaper cold storage (Glacier, Coldline) automatically.

Metadata (title, description, owner, view count, status, rendition URLs) lives in a relational database (MySQL with read replicas at YouTube). View counts, likes, and comments use a separate counter service that accepts high-write traffic and periodically flushes to the main DB — you cannot afford row-level locking on the videos table for every play event.

Deep Dive 4 — CDN Delivery and Adaptive Bitrate

Serving 23 Tbps from a single origin is physically impossible. The solution is a Content Delivery Network: a global mesh of hundreds of Points of Presence (PoPs) that cache video segments close to viewers. YouTube operates its own CDN (Google Global Cache / GGC) supplemented by ISP-embedded caches.

The protocol that enables smooth streaming across variable-bandwidth connections is HTTP Live Streaming (HLS). How it works:

The HLS packager produces a master manifest (.m3u8) listing all available quality levels.
Each quality level has its own media manifest listing short segments (.ts files, typically 6–10 seconds each).
The player downloads the master manifest, picks an initial quality based on current bandwidth, and starts fetching segments.
After each segment, the player measures download throughput. If bandwidth drops, it switches to a lower-quality manifest for the next segment — seamlessly, mid-video.

Why short segments? A 6-second segment means the player can switch quality every 6 seconds. Shorter segments = more responsive adaptation but higher HTTP overhead. 6–10 seconds is the industry sweet spot.

The CDN caches both the static video segments (long TTL, essentially forever) and the manifests (shorter TTL so quality-level changes propagate). A signed URL or token protects premium content: the API server issues a time-limited, HMAC-signed URL; the CDN edge validates the signature before serving the segment.

Common pitfall — the Cold Start problem: A brand-new video has no CDN cache. If a creator with 10 million subscribers publishes a video and everyone clicks at once, all requests miss the CDN and hammer the origin simultaneously. Mitigations: (1) pre-warm CDN edges by pushing popular new videos before the public URL is live, (2) use CDN request coalescing (only one origin fetch per PoP per cache miss), and (3) rate-limit the origin so it does not fall over during the warm-up window.

Trade-offs Summary

Decision	Chosen approach	Key trade-off accepted
Upload reliability	Chunked / resumable upload	More client complexity; no failed multi-GB uploads
Transcoding latency	Async queue + parallel workers	Video is not instantly available; creator sees processing delay
Video storage	Immutable object store	No in-place edits; re-transcode on quality change
Streaming protocol	HLS (adaptive bitrate)	Manifests add complexity; better than fixed-bitrate buffering
Metadata storage	Relational DB + read replicas	Eventual-consistent read replicas; view counts on separate service
Delivery	Multi-tier CDN	Cache invalidation complexity; massive bandwidth savings