Vertical Partitioning

Updated June 3, 2026
M
Magic Magnets Team
7 min read

Most engineers, when they hear "partitioning a database," immediately think of splitting rows across machines. That's horizontal partitioning, also called sharding. But there's another dimension: splitting columns. That's vertical partitioning, and it solves a different set of problems in surprisingly elegant ways.

The Core Idea

Vertical partitioning means taking a wide table and splitting it into multiple narrower tables (or moving certain columns to entirely different storage systems), each sharing the same primary key.

Consider a users table:

ColumnAccess Pattern
idEvery query
emailLogin, profile
display_nameProfile, feed
bioProfile page only
avatar_urlProfile page only
profile_picture_blobProfile page only
created_atAnalytics
last_login_atAuth service

Almost every part of your system needs id, email, and display_name. But the full profile picture blob? That gets loaded maybe once when a user visits someone's profile. Why should your auth service or feed service drag that enormous column through memory on every request?

Quiz Time

What does vertical partitioning split, and how does it differ from horizontal partitioning?

Hot vs Cold Data

Vertical partitioning is really about hot and cold data. Hot data is accessed constantly — it needs to be fast, fit in memory, stay in your primary database. Cold data is accessed rarely — it should be cheap to store and doesn't need to live alongside your hot columns.

A clean split looks like this:

users table (hot, primary database):

id, email, display_name, hashed_password, created_at, last_login_at

user_profiles table (warm, same database, rarely joined):

user_id, bio, website_url, location, twitter_handle

user_media (cold, object storage or separate service):

avatar_url → S3 or GCS key profile_picture → S3 or GCS key

The users table is now much narrower. More rows fit in a single database page. More pages fit in the buffer pool (in-memory cache). Queries that only touch the hot columns — which is most queries — get dramatically faster because the database reads less data from disk per row.

Twitter does something similar with user objects: the core account data is in one store, the profile media in object storage, and the social graph (followers, following) in a separate graph database entirely.

Quiz Time

In the hot/cold data model, "hot" columns should be kept in the primary database because they are accessed constantly.

Moving Blobs to Object Storage

The most impactful form of vertical partitioning for many systems is pulling binary large objects (BLOBs) out of the database entirely.

Storing images, videos, or documents in a relational database is tempting because it feels simple — one place for all data. But databases are terrible at serving large binary files:

  • They hold database connections open while streaming large reads
  • BLOBs balloon the size of backups
  • They make replication lag worse (more data to ship)
  • They can't be served efficiently from a CDN

The right answer is almost always: store the binary in S3, GCS, or Azure Blob Storage, and put just the URL or storage key in your database. When the user needs the image, your application generates a signed URL and lets the object store serve it directly to the client — your application server isn't in the data path at all.

-- Before: user_photos table: (id, user_id, filename, data BYTEA) ← 10MB per row average -- After: user_photos table: (id, user_id, filename, s3_key TEXT, size_bytes INT) Photos stored in S3, served via CloudFront CDN

This is vertical partitioning in its most extreme form — you've moved columns to an entirely different storage tier.

Quiz Time

When a binary file is moved from a relational database to S3, what is the recommended way to serve it to a client?

Benefits for Query Performance

Narrower tables improve performance in several concrete ways:

More rows per page. Database pages are typically 8KB. A table with 10 columns averaging 200 bytes per row fits 40 rows per page. Strip it down to 5 columns averaging 80 bytes and you fit 100 rows per page. More rows per page means fewer I/O operations per query.

Better buffer pool efficiency. Databases cache pages in memory. A narrower table means more of your hot rows fit in the same amount of RAM. For a 16GB buffer pool, the difference can mean the hot dataset fits in memory entirely vs. spilling to disk constantly.

Smaller indexes. Indexes on narrow tables are smaller, which means they're more likely to stay in memory and traversal costs less.

Improved caching at the application layer. If you fetch a lean user object (id, email, display_name) on every API request, it's cheap to cache and invalidate. Fat objects with blobs attached are harder to cache efficiently.

Quiz Time

A narrower table stores more rows per database page, which reduces the number of disk I/O operations needed per query.

Real-World Example: E-Commerce Product Catalog

Amazon's product catalog is a classic vertical partitioning use case. Think about what a product record contains:

  • Hot: product_id, title, price, seller_id, category, average_rating, in_stock
  • Warm: description (long text, loaded on product page), bullet points, tech specs
  • Cold: high-res images (stored in S3), video reviews (stored in S3), detailed compliance documents

The product listing page loads fast because it only touches hot columns. The detail page makes a second request for the warm data. Images are served from CloudFront. The core database tables stay lean and cache-friendly.

When to Apply Vertical Partitioning

Vertical partitioning pays off when:

  • Your table has columns with radically different access frequencies
  • Some columns are very large (long text, blobs, JSONB documents)
  • You're hitting memory limits and want your hot data to fit in the buffer pool
  • Different parts of your system need different subsets of a wide table
  • You want to enforce access control at the storage level (e.g., only the profile service can read PII columns)

It's probably overkill when your table is already narrow, your dataset is small, or the joins required to reassemble the data add more complexity than the performance gain is worth.

Quiz Time

Which scenario is the strongest candidate for applying vertical partitioning?

The Cost: Joins

The obvious downside of vertical partitioning is that reassembling a complete record requires a join. If you split users into users and user_profiles, any query that needs both sets of columns must join them.

Joins aren't free. They add query complexity, can introduce performance issues if indexes aren't in place, and can complicate your ORM and application code. The discipline is making sure you actually access those "warm" columns infrequently enough that the join cost is worth the cache efficiency you gain on the hot path.

Quiz Time

Vertical partitioning is always worth applying, regardless of how often the separated columns are accessed.

Summary

Vertical partitioning is about separating hot, frequently-accessed columns from cold, rarely-needed ones, optionally moving those cold columns to cheaper, more appropriate storage tiers. It makes your core tables narrower, which means more data fits in memory, queries touch less data on disk, and your cache hit rates improve. The most common and impactful application is moving BLOBs out of relational databases and into object storage like S3. The tradeoff is join complexity: only partition when the access frequency difference genuinely justifies it.

Query Optimization

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices