Sharding vs Partitioning
Updated June 3, 2026Sooner or later, every successful company hits a wall: their database simply becomes too big.
When your users table hits 100 gigabytes, indexes stop fitting in RAM, queries slow to a crawl, and backing up the database takes hours. Buying a bigger server (Vertical Scaling) is incredibly expensive, and eventually, hardware limits out. You can't buy a server with infinite RAM.
When a single table gets too massive to handle, you have to chop it up into smaller pieces. In system design, there are two primary ways to do this: Partitioning and Sharding.
These terms are often confused, but they represent fundamentally different architectural decisions.
What is the key difference between partitioning and sharding?
Partitioning (Logical Division)
Think of Partitioning like organizing a massive filing cabinet. If you run an accounting firm, throwing every single receipt from the last ten years into one giant drawer is a nightmare. Instead, you buy folders. You put 2020 receipts in one folder, 2021 in another, and 2022 in a third.
They are all still in the same physical cabinet in the same room, but they are logically separated.
In database terms, Partitioning means breaking a giant table into smaller, logically distinct tables, but keeping them all on the same physical database server.
How it works:
You define a "Partition Key." A classic example is time.
If you have a massive logs table, you can tell PostgreSQL to partition it by month.
Under the hood, Postgres creates hidden mini-tables: logs_jan, logs_feb, logs_mar.
To the application developer, nothing changes. You still run SELECT * FROM logs WHERE date = 'Feb 15'. The database engine intercepts the query, realizes it only needs February's data, and instantly routes the query directly to the logs_feb mini-table, completely ignoring the rest of the year.
Pros of Partitioning:
- Easy Deletion: Want to delete data older than 5 years? Instead of running a slow, lock-heavy
DELETEquery on 50 million rows, you just runDROP TABLE logs_2018. It happens instantly. - Index Health: Smaller tables mean smaller B-Tree indexes, which are more likely to fit entirely in fast RAM.
- Application Transparency: Your backend code doesn't need to change. The database handles the routing.
Why is dropping a partition table faster than running a large DELETE query on the same data?
Cons of Partitioning:
- The Hardware Limit: Because all the partitions still live on the same physical server, you are still bound by the maximum disk space, CPU, and RAM of that single machine.
Partitioning solves the problem of slow queries on large tables.
Sharding (Physical Division)
If partitioning is organizing the filing cabinet, Sharding is realizing the filing cabinet is full, so you buy three new cabinets and put them in three different office buildings.
Sharding (sometimes called Horizontal Partitioning) means taking your massive table and distributing the rows across entirely different physical database servers.
How it works:
Imagine you have 300 million users. You spin up three completely separate PostgreSQL servers (Node A, Node B, Node C).
You choose a "Shard Key," such as user_id. You create a routing algorithm in your application layer. A simple algorithm is a modulus operator: user_id % 3.
- User 1 goes to Node B.
- User 2 goes to Node C.
- User 3 goes to Node A.
When your application needs to fetch User 2, it calculates the math, realizes User 2 lives on Node C, and opens a network connection directly to that specific server.
Your application runs `SELECT * FROM logs WHERE date = 'Feb 15'` against a partitioned table. Which component decides which partition to read?
Pros of Sharding:
- Infinite Scalability: If your servers fill up, you just buy a 4th server, update your routing logic, and move some data over. There is theoretically no limit to how much data you can store.
- Fault Tolerance: If Server A crashes, only 1/3 of your users are affected. The rest of the system stays online.
Which sharding con makes it hardest to scale beyond a simple modulus-based routing algorithm?
Cons of Sharding:
- Massive Complexity: Sharding is the hardest thing you can do in database architecture.
- No Cross-Shard Joins: If you want to join a User on Server A with an Order that accidentally got saved to Server B, you can't use SQL. You have to pull the data into your application memory and join it using code.
- Rebalancing Nightmare: What happens when User 3 becomes an influencer and generates a million times more data than everyone else? Server A fills up instantly (a "hot spot"). Moving data between shards to rebalance the cluster without downtime is an engineering nightmare.
You should consider sharding before trying vertical scaling or partitioning.
Summary
- When a table grows too large, it must be chopped into smaller pieces to maintain performance.
- Partitioning divides a large table into smaller tables that all reside on the same physical server. It optimizes indexes and makes managing historical data easy, but doesn't solve hardware limits.
- Sharding distributes the rows of a table across multiple physical servers. It provides infinite scalability but introduces massive architectural and application-level complexity.
- You should always exhaust Vertical Scaling and Partitioning before attempting to Shard a relational database.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices