Block vs File vs Object Storage
Updated June 3, 2026When a system design interviewer asks "where do you store the images?" or "how does Dropbox persist files?", the answer depends entirely on understanding three fundamentally different storage abstractions: block, file, and object storage. They look similar on the surface — they all store bytes — but they differ in how they expose data, how they scale, and what they're actually good at.
Block Storage: Raw Volumes
Block, file, and object storage — access patterns and use cases
Block storage presents itself to the operating system as a raw disk — a sequence of fixed-size blocks (typically 512 bytes or 4KB each). The OS formats it with a filesystem (ext4, NTFS, XFS) and treats it like a local disk.
Think of it like a blank hard drive that you plug in to a server. The storage system doesn't know or care about files, directories, or metadata. It just reads and writes blocks at specific offsets.
Examples: AWS EBS (Elastic Block Store), Google Persistent Disk, Azure Managed Disks
Characteristics:
- Attached to a single instance (usually)
- Very low latency — microsecond-level access
- The OS/application controls everything: filesystem layout, caching, buffering
- Supports random reads and writes efficiently
Who uses it: Databases. Relational databases like PostgreSQL and MySQL run on block storage because they need to control exactly how data is written to disk — they implement their own page management, write-ahead logging, and fsync behavior. Running a database on a network file system or object store would be a disaster. EBS io2 volumes on AWS, for example, offer provisioned IOPS specifically for database workloads.
Block storage doesn't scale horizontally. You can make the volume bigger (within limits), but you can't easily split it across machines. It's fundamentally a single-machine abstraction.
Why do relational databases like PostgreSQL run on block storage rather than object or file storage?
File Storage: The Familiar Hierarchy
File storage organizes data into the hierarchical directory structure you've used your whole life: folders inside folders, files with names and paths. Unlike block storage, file storage speaks a filesystem protocol — NFS (Network File System) or SMB (Server Message Block) — so multiple machines can mount the same filesystem simultaneously.
Examples: AWS EFS (Elastic File System), Google Filestore, Azure Files, NFS servers, Dropbox (from the client's perspective)
Characteristics:
- Shared access — many servers can read and write the same files concurrently
- POSIX-compliant: supports file locking, permissions, directory traversal
- Higher latency than block storage (it's a network call)
- Scales better than block storage, but with limits
Who uses it: Shared code repositories, legacy enterprise applications that expect a filesystem, ML training jobs that need to read the same large dataset from many workers simultaneously. AWS EFS is popular for Kubernetes workloads that need shared persistent storage accessible from any pod.
The familiar filesystem abstraction is both the strength and the weakness of file storage. It's intuitive, but the POSIX semantics (locking, ordering guarantees) make it hard to build a truly distributed, massively scalable system on top of it.
File storage is harder to scale massively than object storage primarily because of its POSIX semantics.
Object Storage: Flat Namespace for Blobs
Object storage throws out the filesystem hierarchy entirely. Data is stored as objects in a flat namespace, identified by a unique key. An object is a bundle of: the data itself (any blob, any size), metadata (key-value pairs), and a unique identifier.
There are no directories — you can simulate them with key prefixes like photos/2024/01/15/img_001.jpg, but the storage system treats this as just a key with slashes in it. There's no concept of a current directory or relative path.
Examples: AWS S3, Google Cloud Storage (GCS), Azure Blob Storage, Cloudflare R2, MinIO (self-hosted)
Characteristics:
- Accessed via HTTP APIs (PUT, GET, DELETE — not POSIX calls)
- Designed for massive scale — S3 stores exabytes of data
- Eventual consistency (though S3 has offered strong consistency since 2021)
- Cheap per GB — typically 10-20x cheaper than block storage
- Built-in features: versioning, lifecycle policies, replication, public URL serving
- Not suitable for random reads/writes within an object — you read/write the entire object
Who uses it: Everything that doesn't need a filesystem. Profile pictures, video files, backups, data lake storage, ML model artifacts, static website assets, log archives. Netflix stores its entire video catalog in S3. GitHub stores Git LFS objects in object storage. Every mobile app that lets you upload a photo is almost certainly putting it in object storage.
The key insight: object storage scales horizontally essentially without limit because there's no shared mutable state between objects. S3 can handle millions of requests per second across billions of objects because reads and writes to different objects are completely independent.
Which statement best describes how object storage handles directories?
Performance, Cost, and Scalability Tradeoffs
| Block Storage | File Storage | Object Storage | |
|---|---|---|---|
| Latency | Microseconds | Milliseconds | 10s of milliseconds |
| Throughput | Very high | Moderate | Very high (with parallelism) |
| Random access | Excellent | Good | Poor (full object read/write) |
| Concurrency | Single instance | Multiple instances | Unlimited |
| Cost (per GB) | $$$$ | $$$ | $ |
| Max scale | TBs | TBs-PBs | Unlimited (exabytes) |
| Protocols | iSCSI, NVMe | NFS, SMB | HTTP (REST) |
Cost is one of the starkest differences. AWS EBS gp3 costs around $0.08/GB/month. EFS costs around $0.30/GB/month. S3 Standard costs $0.023/GB/month. For a company storing petabytes of user media, the difference between block and object storage is the difference between a manageable infrastructure bill and a catastrophic one.
AWS S3 Standard costs roughly the same per GB per month as AWS EBS gp3.
When to Use Each
Use block storage when:
- Running a database (PostgreSQL, MySQL, Elasticsearch, Kafka)
- You need low-latency, random I/O
- Your application expects a local disk
Use file storage when:
- Multiple servers need simultaneous read/write access to the same files
- You're running legacy software that expects POSIX semantics
- ML training jobs need shared access to large datasets
- You need file locking or directory-level operations
Use object storage when:
- Storing user-uploaded media (images, videos, documents)
- Building a data lake for analytics
- Serving static assets via CDN
- Long-term archival and backups
- Any blob that's written once and read many times
A team needs multiple GPU training instances to read the same large dataset simultaneously during a training job. Which storage type fits best?
A Practical Architecture Pattern
Practical architecture — all three storage types working together
Most non-trivial systems use all three:
- PostgreSQL runs on EBS (block storage) — the database needs low-latency random I/O and full control over disk layout.
- User profile pictures are stored in S3 (object storage) — cheap, infinitely scalable, served directly to clients via CloudFront CDN.
- Shared ML training datasets live on EFS (file storage) — multiple GPU instances need concurrent read access during training.
The mistake engineers make is trying to use one storage type for everything. Putting images in a database works at small scale and becomes a catastrophe at large scale. Trying to run a database on object storage doesn't work at all — you can't do random in-place writes to an S3 object.
Why does object storage scale horizontally without practical limits while block storage does not?
Summary
Block, file, and object storage solve different problems. Block storage is a raw disk abstraction — low latency, random access, single machine, expensive. Use it for databases and anything that needs OS-level filesystem control. File storage adds a shared filesystem — multiple machines, POSIX semantics, moderate cost. Use it when multiple servers need concurrent file access. Object storage abandons the filesystem model for a flat key-value API — infinitely scalable, cheap, HTTP-native, but no random access within objects. Use it for all blobs: media, backups, artifacts, static assets. In practice, mature systems use all three, matching each storage layer to the access pattern it's designed for.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices