Databases & Storage

Blob & Object Storage

18 min Lesson 8 of 10

Blob & Object Storage

Every serious application eventually produces data that relational databases were never designed to hold: profile photos, product images, video recordings, PDF invoices, ML model weights, database backups. These assets are large, immutable once written, and consumed far more often than they are updated. Stuffing them into a relational database as BLOB columns works at toy scale but collapses under real load. Object storage is the industry-standard answer: a flat, infinitely scalable store optimised specifically for large binary objects.

What Is Object Storage?

Unlike a filesystem (which organises data in a directory hierarchy) or a block device (which exposes raw sectors), object storage treats every piece of data as a self-contained object composed of three parts:

  • Key — a globally unique string identifier, e.g. users/42/avatar.jpg. There is no real folder hierarchy; the slash is just part of the key string, though UIs often simulate folders from it.
  • Value — the raw bytes of the file (ranging from a few bytes to terabytes).
  • Metadata — a flat key-value map attached to the object: Content-Type, Cache-Control, custom tags (e.g. user-id: 42), creation time, and an ETag (a content hash used for cache validation and conditional requests).

The dominant public-cloud offering is Amazon S3 (Simple Storage Service), launched in 2006. Every major cloud has an equivalent: Google Cloud Storage (GCS), Azure Blob Storage, and many S3-compatible open-source or on-prem systems (MinIO, Ceph, Cloudflare R2). The S3 HTTP API has become the de-facto standard; if you understand S3, you understand all of them.

Architecture: How Object Storage Works Internally

Object storage services are themselves distributed systems. Understanding their internals helps you predict their consistency guarantees and failure modes.

Object storage internal architecture Client PUT / GET API Gateway Auth + routing Rate limiting Metadata Store Key to location map (distributed KV) Storage Node A chunk replica 1 Storage Node B chunk replica 2 Storage Node C chunk replica 3 lookup write 3x CDN Edge PoP cache serves reads origin pull fast read (cached)
Object storage request flow: writes fan out to multiple storage nodes for durability; reads are served from a CDN edge on cache hit, falling back to the origin on a miss.

When you PUT an object, the API gateway authenticates the request, writes the key-to-location mapping to a distributed metadata store, and replicates the raw bytes across multiple physical storage nodes (typically 3x within a region). Erasure coding is often used at scale instead of full replication: the object is split into data shards and parity shards so the full object can be reconstructed from any subset, achieving durability with far less storage overhead than 3x replication.

When you GET an object, the gateway consults the metadata store to find which nodes hold the chunks, fetches them in parallel, and streams the bytes to the client. In practice, a CDN sits in front for popular objects so most reads never reach the origin store at all.

Consistency Model

Amazon S3 delivered eventual consistency from 2006 until December 2020. If you uploaded a new object and immediately read it from a different node, you might get a 404. Since December 2020, S3 guarantees strong read-after-write consistency for new objects and strong list consistency. GCS has always been strongly consistent. MinIO and Ceph are also strongly consistent within a single cluster.

Legacy code built for eventual consistency. A large body of distributed-systems code written before 2020 defensively retries reads, caches ETags, or uses presigned URL tricks to work around S3 eventual consistency. Most of that complexity is no longer needed for new S3-backed workloads, but you will encounter it in older codebases.

Key Capabilities

Presigned URLs

You never want your application server to sit in the data path for large file uploads or downloads — that wastes bandwidth, CPU, and memory on your servers. Instead, the server generates a presigned URL: a time-limited, cryptographically signed URL that grants the holder permission to perform exactly one operation (GET or PUT) directly against the object storage service. The client uploads directly to S3; your server only receives a notification after the upload completes.

Presigned URL upload flow Client Browser / App App Server signs URL (AWS SDK) Object Storage S3 / GCS / R2 1. Request upload URL 2. Presigned PUT URL (TTL=5 min) 3. Direct PUT to storage (app server bypassed) 4. S3 event notifies app server
Presigned URL pattern: the app server signs and issues a short-lived URL; the client uploads directly to object storage, keeping the app server out of the data path.

Storage Classes and Lifecycle Policies

S3 offers multiple storage tiers at different price-performance points:

  • S3 Standard — millisecond retrieval, highest cost (~$0.023/GB/month). For frequently accessed data.
  • S3 Standard-IA (Infrequent Access) — same latency, lower storage cost, higher per-request cost. For assets accessed monthly rather than daily.
  • S3 Glacier Instant Retrieval — millisecond retrieval, 68% cheaper than Standard. For quarterly access.
  • S3 Glacier Deep Archive — retrieval in hours, ~$0.00099/GB/month. For long-term compliance archives (medical records, financial logs).

Lifecycle policies automate transitions between tiers. A common pattern: keep the last 30 days in Standard, transition to Standard-IA after 30 days, to Glacier after 90 days, and delete after 7 years. This can reduce storage costs by 80%+ without any application code change.

Versioning and Durability

S3 advertises 11 nines of durability (99.999999999% per year). This means on average you would lose one object per 100 billion stored per year — achieved through cross-AZ replication and erasure coding. Enable versioning on any bucket holding user-generated content or backups: it keeps every version of every object so accidental deletes and corrupted overwrites are recoverable with a single API call.

Object storage is not a database. You cannot query objects by content, perform partial updates, or run atomic multi-object transactions. If you need to find all photos tagged to user_id=42, store the metadata (key name, user ID, size, created_at) in a relational database and use object storage purely for the bytes. The key-to-metadata mapping in your DB is cheap; the binary data in S3 is cheap. Splitting them is the right design.

Serving Objects to End Users

For private objects (medical records, signed contracts, private videos), generate presigned GET URLs with a short TTL (15–60 minutes) so only authenticated users can access them and the URL expires quickly if leaked.

For public or semi-public objects (profile photos, product images, public video), put a CDN in front. Configure the object storage bucket as the CDN origin. The CDN caches the object at edge nodes globally. A user in Tokyo fetching a photo stored in us-east-1 will get it from the nearest CDN edge node in milliseconds instead of hundreds of milliseconds from the origin. Set Cache-Control: public, max-age=31536000, immutable on versioned assets (assets whose key includes a content hash) so the CDN and browsers cache them indefinitely.

Never expose your bucket publicly without a CDN or signed URLs. A publicly readable bucket with no CDN means every request hits your origin, you pay full egress bandwidth charges (S3 egress is ~$0.09/GB), and you gain no geographic performance benefit. Worse, accidentally public buckets containing sensitive data are a frequent cause of data breaches. Adopt a default "bucket private" policy and serve everything through signed URLs or a CDN.

Real-World Scale Numbers

Netflix stores its entire video library — hundreds of petabytes — in S3. Dropbox migrated away from S3 to their own object storage system (Magic Pocket) at hundreds of petabytes. Instagram at peak stored billions of photos in S3. Facebook built Haystack, a custom object store for photos, specifically because the per-request overhead of a general-purpose filesystem was too high at billions of photo reads per day. These systems all converge on the same architectural primitives you have just learned.

Summary

Object storage is the correct tool for large, immutable binary assets. It provides virtually unlimited capacity, 11-nines durability through replication and erasure coding, strong consistency on modern platforms, and a simple HTTP API. The key patterns are: keep binary data in object storage and metadata in a relational database; use presigned URLs to keep application servers out of the data path; use lifecycle policies to automatically tier data to cheaper storage classes; and front public assets with a CDN to achieve global low latency. Understanding these primitives lets you design the storage layer of any media-heavy, data-intensive system correctly from the start.