Consistent Hashing

Updated June 6, 2026
M
Magic Magnets Team
9 min read

Consistent hashing is one of those concepts that sounds scary until you understand what problem it's solving. Once you get the problem, the solution clicks immediately.

The Problem: Naive Modulo Hashing

Imagine you're building a distributed cache with 4 nodes. You need to decide which node stores each key. The naive approach: hash the key and take the modulo.

node = hash(key) % number_of_nodes

This works great, until you add or remove a node.

Say you add a 5th cache node to handle growing load. Now every key remaps:

Old: node = hash(key) % 4 New: node = hash(key) % 5

The modulo changes. Almost every key now maps to a different node. Your entire cache is effectively invalidated. Every request that was hitting the cache now hits your database at the same time. This is called a cache stampede, and it can bring down your database during exactly the moment you were trying to add capacity.

algobase.dev
Naive modulo hashing with 4 nodes: each key maps to node = hash(key) % 4. Works perfectly until you add or remove a node. Adding a 5th node changes the modulo, causing ~80% of keys to remap instantly. Every remapped cache key is now a cache miss, flooding the database.
1 / 1

Naive modulo hashing with 4 nodes

Quiz Time

When you add a 5th node to a 4-node cluster using naive modulo hashing, approximately how many keys need to be remapped?

The same problem hits databases. If you're sharding a database by user ID with modulo, adding a new shard means rebalancing virtually all of your data. That's millions of rows moving between shards simultaneously.

Consistent hashing solves this by dramatically reducing how much remapping happens when nodes are added or removed.

The Hash Ring

Here's the core idea:

Instead of mapping keys to nodes directly, both keys and nodes are hashed onto a circular ring: a number line from 0 to 2^32, wrapped around on itself.

To find which node handles a key, hash the key to get its position on the ring, then walk clockwise until you hit a node. That node owns the key.

Now add a new node. It gets hashed to some position on the ring. Only the keys that fall between the new node and its counter-clockwise neighbor need to move. Everything else is undisturbed.

If you have N nodes and add one more, only ~1/N of keys need to be remapped. Compare this to naive modulo hashing where nearly all keys move.

Remove a node? The keys it was handling move to the next node clockwise. Again, only those keys move; everything else is untouched.

Quiz Time

In a consistent hash ring, how do you determine which node is responsible for a given key?

algobase.dev
Consistent hashing places both keys and nodes on a circular ring (0-360°). To find a key's node: hash the key to get its ring position, then walk clockwise to the first node. Each node owns one arc of the ring. Adding a new node between B and C moves only the keys in that arc; everything else stays put.
1 / 1

Consistent hash ring

Quiz Time

When a node is added to a consistent hash ring, only ~1/N of keys need to be remapped (where N is the new node count).

The Problem with Basic Consistent Hashing

The basic version has an issue: uneven distribution. When you hash N nodes onto a ring, they don't land evenly spaced. By chance, some nodes might end up responsible for a huge portion of the ring, while others sit nearly idle.

This is where virtual nodes (vnodes) come in.

Virtual Nodes

Instead of placing each physical node once on the ring, you place it many times; each physical node gets, say, 150 virtual node positions scattered around the ring.

The result: distribution becomes much more uniform, where each physical node handles roughly the same portion of the keyspace. When you add or remove a physical node, its keys are redistributed across many different nodes (not just one), preventing any single node from getting overloaded.

Virtual nodes also make heterogeneous clusters easy to handle. If you have nodes with different capacities, you give the more powerful nodes more virtual node positions so they end up owning a larger portion of the ring proportional to their capacity.

Quiz Time

What problem do virtual nodes (vnodes) solve in consistent hashing?

algobase.dev
Virtual nodes (vnodes) fix uneven distribution by placing each physical node at multiple positions scattered around the ring. When Node E is added, it takes ~20% of keys from each existing node, rather than all from one neighbor. Distribution stays even, and data migrates from multiple sources simultaneously, making it faster and more balanced.
1 / 1

Virtual nodes (vnodes)

Quiz Time

Virtual nodes make it harder to handle heterogeneous clusters where some nodes have more capacity than others.

Real-World Usage

Consistent hashing shows up in the internals of some of the most widely-used distributed systems:

Apache Cassandra

Cassandra uses consistent hashing to determine which nodes own which partition key ranges. Each node is responsible for a range of tokens on the ring. When you add a node, it takes over a portion of the ring from adjacent nodes, and only the data in that range needs to move. Cassandra defaults to 256 virtual nodes per physical node.

Quiz Time

Which of the following systems uses consistent hashing internally to manage partition key ranges, defaulting to 256 virtual nodes per physical node?

Amazon DynamoDB

DynamoDB's internal partitioning scheme is based on consistent hashing. When DynamoDB adds partitions as your table grows, consistent hashing limits the disruption to your data placement.

CDNs

Content delivery networks like Akamai use consistent hashing to route requests to edge nodes. When a new edge node comes online, only the content that now maps to it needs to be cached there, not the entire content library.

Distributed Caches

Systems like Memcached clusters (and many Redis setups) use consistent hashing client-side to determine which cache node holds a given key. Libraries like ketama implement this in client code.

Consistent Hashing in Practice

When you're designing a system that needs to distribute load across nodes, consistent hashing is worth reaching for when:

  • You need to frequently add or remove nodes (elasticity)
  • Cache invalidation from remapping is unacceptably expensive
  • You're sharding a large dataset and need controlled data movement on rebalancing

When you don't need it: small, fixed-size clusters where you rarely add or remove nodes, or where the data is cheap to rebalance.

Summary

Naive modulo hashing breaks badly when nodes are added or removed: nearly all keys remap, causing cache stampedes or massive data migrations. Consistent hashing solves this by placing both keys and nodes on a circular ring, so only ~1/N keys remap when a node is added. Virtual nodes address the uneven distribution problem by placing each physical node at multiple points on the ring. This technique underpins Cassandra, DynamoDB, CDNs, and distributed caches; anywhere you need elastic, low-disruption key-to-node mapping.

CAP Theorem

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices