Distributed Cache Architecture

Updated June 8, 2026
M
Magic Magnets Team
9 min read

A single Redis instance running on one server can handle roughly 100,000 requests per second and hold a few hundred gigabytes of data. That sounds like a lot, until you're running an application that serves 10 million active users and needs to cache terabytes of session data, product catalogs, and computed results. At that scale, one server isn't enough. You need a distributed cache.

A distributed cache is a caching system spread across multiple servers that acts as a single logical cache from the perspective of your application. The challenge is making multiple machines look and behave like one.

Why You Need Distributed Caching at Scale

Three problems drive the need for distributing a cache:

  1. Memory limits — A single machine has a maximum amount of RAM. Once your working set exceeds that, you can't fit it on one server.
  2. Throughput limits — A single server has finite CPU and network bandwidth. At high enough request rates, it becomes the bottleneck.
  3. Availability — A single server is a single point of failure. If it goes down, your cache goes cold and your database absorbs all the traffic. At scale, that's catastrophic.

Distributing the cache across multiple nodes addresses all three.

Two Approaches: Replication vs Sharding

There are two fundamental ways to spread a cache across multiple nodes, and they solve different problems.

algobase.dev
Replication: every replica holds the full dataset. Writes go to the primary; reads spread across replicas for higher throughput. Adding replicas scales reads, not memory capacity.
1 / 1

Replication: every replica holds the full dataset. Writes go to the primary; reads spread across replicas for higher throughput.

algobase.dev
Sharding: consistent hashing routes each key to its shard based on a hash. Each shard owns a slice of the dataset and has its own replica. Adding shards scales total memory capacity.
1 / 1

Sharding: consistent hashing routes each key to its assigned shard. Each shard owns a slice of the dataset and has its own replica.

Replication

Every node holds a complete copy of the data.

When you write to a replicated cache, the write propagates to all replicas. When you read, you can read from any replica.

What it solves: Availability and read throughput. If one node fails, others have the full dataset. With N replicas, you can serve N times the read throughput.

What it doesn't solve: Memory. If your dataset is 1 TB and you have 3 replicas, you need 3 TB of total RAM. Replication doesn't help you cache more data. It helps you serve it faster and more reliably.

Common model: Redis Sentinel and Redis replication use a primary-replica model. One primary handles writes; replicas sync from the primary and handle reads.

Sharding (Partitioning)

Each node holds a subset of the data.

The dataset is split across N nodes. Each node is responsible for a range of keys. A key like user:123 might live on node 1, while user:456 lives on node 2.

What it solves: Memory and write throughput. With 3 nodes, you can cache 3 times as much data. Writes are distributed across nodes too.

What it doesn't solve: Availability on its own. If a node fails, all the keys on that node are unavailable.

In practice: You almost always combine sharding with replication. Each shard has its own replica(s). Redis Cluster does exactly this. It shards data across 16,384 hash slots distributed across N nodes, with each node being the primary for some slots and a replica for others.

Consistent Hashing: How Keys Find Their Node

Naive sharding (key hash % number_of_nodes) has a fatal flaw: when you add or remove a node, almost every key maps to a different node. Your cache hit rate drops to zero while everything remaps.

Consistent hashing fixes this. The basic idea:

  • Map both nodes and keys onto a ring (imagine a circle of hash values from 0 to 2^32)
  • Each key is assigned to the nearest node clockwise on the ring
  • When you add a node, it only takes responsibility for keys that are now "closer" to it — typically around 1/N of keys
  • When you remove a node, its keys are redistributed to the next node — again, only 1/N of keys

With consistent hashing, adding or removing a node only disrupts ~1/N of keys, keeping the majority of your cache warm.

Modern systems use "virtual nodes" (vnodes): each physical node is represented as multiple points on the ring to ensure even key distribution.

Cache Coherence Challenges

Distributing your cache introduces new failure modes that a single-node cache doesn't have.

The Thundering Herd / Cache Stampede

A hot key expires. Before any node can repopulate it, hundreds of simultaneous requests flood your database. With a distributed cache, this is worse. Potentially all N nodes independently trigger a database query for the same key.

Solutions:

  • Probabilistic early expiration — Randomly refresh a key before it actually expires
  • Mutex on miss — Use a distributed lock so only one request repopulates the key; others wait
  • Local in-process cache — Keep a small L1 cache in each application server for the hottest keys

Split Brain

In a replicated setup, if the network partitions and the primary can't communicate with replicas, you have to choose: let the primary keep accepting writes (risking divergence), or pause writes until connectivity is restored.

Redis Sentinel handles this with quorum-based leader election. A new primary is elected only if a majority of sentinels can see it, preventing split brain.

Replication Lag

Writes to the primary take time to propagate to replicas. During that window, a read from a replica returns stale data. For eventually-consistent use cases (session data, feed content), this is fine. For balance-checking or inventory counts, you need to read from primary — or not cache at all.

Real Distributed Cache Systems

Redis Cluster

The native Redis clustering solution. Automatically shards across nodes using consistent hashing (16,384 hash slots), replicates each shard, and handles failover automatically. Supports horizontal scaling by adding nodes. Most teams use Redis Cluster via managed services like AWS ElastiCache or Google Cloud Memorystore.

Limitation: Multi-key operations across shards (like MGET across different keys) are restricted. Keys in the same "hash slot" are guaranteed to be colocated. You can force this with hash tags: {user:123}:profile and {user:123}:settings both hash on user:123, landing on the same shard.

Memcached

Simpler than Redis. It does one thing: distributed in-memory key/value caching. No persistence, no pub/sub, no data structures beyond strings. But it's extremely fast and horizontally scalable by design.

Sharding in Memcached is done client-side. The client library decides which server to send each key to using consistent hashing. There's no built-in replication; availability is handled by the client failing over to another server.

Facebook historically used Memcached at massive scale (hundreds of servers, trillions of requests per day). Their 2013 paper on "Scaling Memcache at Facebook" is a classic read on distributed caching at extreme scale.

Hazelcast

A distributed, in-memory data grid designed from the ground up for clustering. Unlike Redis (which started as a single-node system and added clustering later), Hazelcast was built as a peer-to-peer distributed system. Each node is equal — no primary/replica distinction by default.

Popular in Java ecosystems and enterprise environments. Supports distributed maps, queues, locks, and computation.

Client-Side Caching

One technique worth knowing: client-side caching keeps a local cache in the application process itself, separate from the distributed cache.

The idea is a two-level hierarchy:

  • L1: In-process cache (e.g., an LRU map in your application server) — microsecond reads, limited size
  • L2: Distributed cache (Redis Cluster) — millisecond reads, large capacity

Hot keys (like configuration values, feature flags, global counters) live in L1. Everything else goes to L2. You get the speed of local memory for the hottest data and the scale of a distributed system for everything else.

Redis 6+ supports a client-side caching protocol that lets Redis notify clients when a cached key changes, keeping local caches invalidated automatically.

Summary

Distributed caching is the answer to what happens when your cache grows beyond what a single machine can handle.

  • Replication increases read throughput and availability — every node has the full dataset
  • Sharding increases capacity and write throughput — each node holds a slice of the dataset
  • Consistent hashing minimizes key remapping when nodes are added or removed
  • Cache coherence challenges — stampedes, split brain, replication lag — require deliberate mitigation strategies
  • Redis Cluster is the most common production choice; Memcached and Hazelcast are alternatives with different trade-offs

The architecture pattern that most teams land on: Redis Cluster with per-shard replication, deployed as a managed service, fronted by client-side caching for the hottest keys. Scale horizontally by adding shards. Monitor hit rate, memory, and replication lag.

Cache Invalidation

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices