Vertical Partitioning
Updated June 3, 2026Most engineers, when they hear "partitioning a database," immediately think of splitting rows across machines. That's horizontal partitioning, also called sharding. But there's another dimension: splitting columns. That's vertical partitioning, and it solves a different set of problems in surprisingly elegant ways.
The Core Idea
Vertical partitioning means taking a wide table and splitting it into multiple narrower tables (or moving certain columns to entirely different storage systems), each sharing the same primary key.
Consider a users table:
| Column | Access Pattern |
|---|---|
| id | Every query |
| Login, profile | |
| display_name | Profile, feed |
| bio | Profile page only |
| avatar_url | Profile page only |
| profile_picture_blob | Profile page only |
| created_at | Analytics |
| last_login_at | Auth service |
Almost every part of your system needs id, email, and display_name. But the full profile picture blob? That gets loaded maybe once when a user visits someone's profile. Why should your auth service or feed service drag that enormous column through memory on every request?
What does vertical partitioning split, and how does it differ from horizontal partitioning?
Hot vs Cold Data
Vertical partitioning is really about hot and cold data. Hot data is accessed constantly — it needs to be fast, fit in memory, stay in your primary database. Cold data is accessed rarely — it should be cheap to store and doesn't need to live alongside your hot columns.
A clean split looks like this:
users table (hot, primary database):
id, email, display_name, hashed_password, created_at, last_login_atuser_profiles table (warm, same database, rarely joined):
user_id, bio, website_url, location, twitter_handleuser_media (cold, object storage or separate service):
avatar_url → S3 or GCS key
profile_picture → S3 or GCS keyThe users table is now much narrower. More rows fit in a single database page. More pages fit in the buffer pool (in-memory cache). Queries that only touch the hot columns — which is most queries — get dramatically faster because the database reads less data from disk per row.
Twitter does something similar with user objects: the core account data is in one store, the profile media in object storage, and the social graph (followers, following) in a separate graph database entirely.
In the hot/cold data model, "hot" columns should be kept in the primary database because they are accessed constantly.
Moving Blobs to Object Storage
The most impactful form of vertical partitioning for many systems is pulling binary large objects (BLOBs) out of the database entirely.
Storing images, videos, or documents in a relational database is tempting because it feels simple — one place for all data. But databases are terrible at serving large binary files:
- They hold database connections open while streaming large reads
- BLOBs balloon the size of backups
- They make replication lag worse (more data to ship)
- They can't be served efficiently from a CDN
The right answer is almost always: store the binary in S3, GCS, or Azure Blob Storage, and put just the URL or storage key in your database. When the user needs the image, your application generates a signed URL and lets the object store serve it directly to the client — your application server isn't in the data path at all.
-- Before:
user_photos table: (id, user_id, filename, data BYTEA) ← 10MB per row average
-- After:
user_photos table: (id, user_id, filename, s3_key TEXT, size_bytes INT)
Photos stored in S3, served via CloudFront CDNThis is vertical partitioning in its most extreme form — you've moved columns to an entirely different storage tier.
When a binary file is moved from a relational database to S3, what is the recommended way to serve it to a client?
Benefits for Query Performance
Narrower tables improve performance in several concrete ways:
More rows per page. Database pages are typically 8KB. A table with 10 columns averaging 200 bytes per row fits 40 rows per page. Strip it down to 5 columns averaging 80 bytes and you fit 100 rows per page. More rows per page means fewer I/O operations per query.
Better buffer pool efficiency. Databases cache pages in memory. A narrower table means more of your hot rows fit in the same amount of RAM. For a 16GB buffer pool, the difference can mean the hot dataset fits in memory entirely vs. spilling to disk constantly.
Smaller indexes. Indexes on narrow tables are smaller, which means they're more likely to stay in memory and traversal costs less.
Improved caching at the application layer. If you fetch a lean user object (id, email, display_name) on every API request, it's cheap to cache and invalidate. Fat objects with blobs attached are harder to cache efficiently.
A narrower table stores more rows per database page, which reduces the number of disk I/O operations needed per query.
Real-World Example: E-Commerce Product Catalog
Amazon's product catalog is a classic vertical partitioning use case. Think about what a product record contains:
- Hot: product_id, title, price, seller_id, category, average_rating, in_stock
- Warm: description (long text, loaded on product page), bullet points, tech specs
- Cold: high-res images (stored in S3), video reviews (stored in S3), detailed compliance documents
The product listing page loads fast because it only touches hot columns. The detail page makes a second request for the warm data. Images are served from CloudFront. The core database tables stay lean and cache-friendly.
When to Apply Vertical Partitioning
Vertical partitioning pays off when:
- Your table has columns with radically different access frequencies
- Some columns are very large (long text, blobs, JSONB documents)
- You're hitting memory limits and want your hot data to fit in the buffer pool
- Different parts of your system need different subsets of a wide table
- You want to enforce access control at the storage level (e.g., only the profile service can read PII columns)
It's probably overkill when your table is already narrow, your dataset is small, or the joins required to reassemble the data add more complexity than the performance gain is worth.
Which scenario is the strongest candidate for applying vertical partitioning?
The Cost: Joins
The obvious downside of vertical partitioning is that reassembling a complete record requires a join. If you split users into users and user_profiles, any query that needs both sets of columns must join them.
Joins aren't free. They add query complexity, can introduce performance issues if indexes aren't in place, and can complicate your ORM and application code. The discipline is making sure you actually access those "warm" columns infrequently enough that the join cost is worth the cache efficiency you gain on the hot path.
Vertical partitioning is always worth applying, regardless of how often the separated columns are accessed.
Summary
Vertical partitioning is about separating hot, frequently-accessed columns from cold, rarely-needed ones, optionally moving those cold columns to cheaper, more appropriate storage tiers. It makes your core tables narrower, which means more data fits in memory, queries touch less data on disk, and your cache hit rates improve. The most common and impactful application is moving BLOBs out of relational databases and into object storage like S3. The tradeoff is join complexity: only partition when the access frequency difference genuinely justifies it.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices