Blob & Object Storage
Blob & Object Storage
Every serious application eventually produces data that relational databases were never designed to hold: profile photos, product images, video recordings, PDF invoices, ML model weights, database backups. These assets are large, immutable once written, and consumed far more often than they are updated. Stuffing them into a relational database as BLOB columns works at toy scale but collapses under real load. Object storage is the industry-standard answer: a flat, infinitely scalable store optimised specifically for large binary objects.
What Is Object Storage?
Unlike a filesystem (which organises data in a directory hierarchy) or a block device (which exposes raw sectors), object storage treats every piece of data as a self-contained object composed of three parts:
- Key — a globally unique string identifier, e.g.
users/42/avatar.jpg. There is no real folder hierarchy; the slash is just part of the key string, though UIs often simulate folders from it. - Value — the raw bytes of the file (ranging from a few bytes to terabytes).
- Metadata — a flat key-value map attached to the object:
Content-Type,Cache-Control, custom tags (e.g.user-id: 42), creation time, and an ETag (a content hash used for cache validation and conditional requests).
The dominant public-cloud offering is Amazon S3 (Simple Storage Service), launched in 2006. Every major cloud has an equivalent: Google Cloud Storage (GCS), Azure Blob Storage, and many S3-compatible open-source or on-prem systems (MinIO, Ceph, Cloudflare R2). The S3 HTTP API has become the de-facto standard; if you understand S3, you understand all of them.
Architecture: How Object Storage Works Internally
Object storage services are themselves distributed systems. Understanding their internals helps you predict their consistency guarantees and failure modes.
When you PUT an object, the API gateway authenticates the request, writes the key-to-location mapping to a distributed metadata store, and replicates the raw bytes across multiple physical storage nodes (typically 3x within a region). Erasure coding is often used at scale instead of full replication: the object is split into data shards and parity shards so the full object can be reconstructed from any subset, achieving durability with far less storage overhead than 3x replication.
When you GET an object, the gateway consults the metadata store to find which nodes hold the chunks, fetches them in parallel, and streams the bytes to the client. In practice, a CDN sits in front for popular objects so most reads never reach the origin store at all.
Consistency Model
Amazon S3 delivered eventual consistency from 2006 until December 2020. If you uploaded a new object and immediately read it from a different node, you might get a 404. Since December 2020, S3 guarantees strong read-after-write consistency for new objects and strong list consistency. GCS has always been strongly consistent. MinIO and Ceph are also strongly consistent within a single cluster.
Key Capabilities
Presigned URLs
You never want your application server to sit in the data path for large file uploads or downloads — that wastes bandwidth, CPU, and memory on your servers. Instead, the server generates a presigned URL: a time-limited, cryptographically signed URL that grants the holder permission to perform exactly one operation (GET or PUT) directly against the object storage service. The client uploads directly to S3; your server only receives a notification after the upload completes.
Storage Classes and Lifecycle Policies
S3 offers multiple storage tiers at different price-performance points:
- S3 Standard — millisecond retrieval, highest cost (~$0.023/GB/month). For frequently accessed data.
- S3 Standard-IA (Infrequent Access) — same latency, lower storage cost, higher per-request cost. For assets accessed monthly rather than daily.
- S3 Glacier Instant Retrieval — millisecond retrieval, 68% cheaper than Standard. For quarterly access.
- S3 Glacier Deep Archive — retrieval in hours, ~$0.00099/GB/month. For long-term compliance archives (medical records, financial logs).
Lifecycle policies automate transitions between tiers. A common pattern: keep the last 30 days in Standard, transition to Standard-IA after 30 days, to Glacier after 90 days, and delete after 7 years. This can reduce storage costs by 80%+ without any application code change.
Versioning and Durability
S3 advertises 11 nines of durability (99.999999999% per year). This means on average you would lose one object per 100 billion stored per year — achieved through cross-AZ replication and erasure coding. Enable versioning on any bucket holding user-generated content or backups: it keeps every version of every object so accidental deletes and corrupted overwrites are recoverable with a single API call.
user_id=42, store the metadata (key name, user ID, size, created_at) in a relational database and use object storage purely for the bytes. The key-to-metadata mapping in your DB is cheap; the binary data in S3 is cheap. Splitting them is the right design.
Serving Objects to End Users
For private objects (medical records, signed contracts, private videos), generate presigned GET URLs with a short TTL (15–60 minutes) so only authenticated users can access them and the URL expires quickly if leaked.
For public or semi-public objects (profile photos, product images, public video), put a CDN in front. Configure the object storage bucket as the CDN origin. The CDN caches the object at edge nodes globally. A user in Tokyo fetching a photo stored in us-east-1 will get it from the nearest CDN edge node in milliseconds instead of hundreds of milliseconds from the origin. Set Cache-Control: public, max-age=31536000, immutable on versioned assets (assets whose key includes a content hash) so the CDN and browsers cache them indefinitely.
Real-World Scale Numbers
Netflix stores its entire video library — hundreds of petabytes — in S3. Dropbox migrated away from S3 to their own object storage system (Magic Pocket) at hundreds of petabytes. Instagram at peak stored billions of photos in S3. Facebook built Haystack, a custom object store for photos, specifically because the per-request overhead of a general-purpose filesystem was too high at billions of photo reads per day. These systems all converge on the same architectural primitives you have just learned.
Summary
Object storage is the correct tool for large, immutable binary assets. It provides virtually unlimited capacity, 11-nines durability through replication and erasure coding, strong consistency on modern platforms, and a simple HTTP API. The key patterns are: keep binary data in object storage and metadata in a relational database; use presigned URLs to keep application servers out of the data path; use lifecycle policies to automatically tier data to cheaper storage classes; and front public assets with a CDN to achieve global low latency. Understanding these primitives lets you design the storage layer of any media-heavy, data-intensive system correctly from the start.