Cache Stampede
Updated June 8, 2026Caching is supposed to protect your database. But under the right (or wrong) conditions, your caching layer can actually be the exact thing that causes your database to completely melt down.
This terrifying scenario is known as a Cache Stampede (also called a Dogpile or Thundering Herd).
What is a Cache Stampede?
A cache stampede occurs when a highly requested piece of data in the cache suddenly expires (its TTL runs out) or is deleted, and a massive burst of concurrent requests all hit the system at the exact same time.
Because the data is suddenly missing from the cache, every single request experiences a cache miss. In a panic, every single thread independently queries the primary database to recalculate the expensive data.
The database, which normally handles 10 queries a second, is suddenly hit with 10,000 identical, heavy queries in a single millisecond. Its CPU spikes to 100%, connections max out, and the database crashes.
Normal state: the warm cache absorbs all traffic. The database is completely idle.
Stampede: the TTL expires and every server misses simultaneously, flooding the database with thousands of identical heavy queries.
The Analogy: The Black Friday Door Buster
Imagine a Best Buy on Black Friday. The store manager knows there's a huge crowd outside, so they create a protective barrier (the Cache): a single employee stands at the door holding a megaphone, yelling the price of the new TV. The crowd is satisfied and doesn't enter the store.
But then, the megaphone battery dies (the Cache Expires).
Suddenly, the barrier is gone. 500 angry shoppers (the Requests) all rush through the front doors at the exact same time, sprinting toward the manager's office (the Database) to ask the exact same question: "How much is the TV?!" The manager is trampled.
How it happens in practice
Let's say you run a sports website. The front page has a "Live Scores" widget that takes a complex 3-second SQL query to generate. You cache it in Redis with a TTL of 60 seconds.
- T=0: The cache is warm. 5,000 users per second are happily reading from Redis. The DB is sleeping.
- T=60: The TTL expires. Redis drops the key.
- T=60.001: Within the first millisecond, 5 requests arrive. They all check Redis. It's empty. All 5 send the heavy 3-second query to the DB.
- T=60.010: 50 more requests arrive. Redis is still empty because the first 5 haven't finished calculating yet. 50 more queries hit the DB.
- T=61.000: A full second has passed. The DB is now crunching 5,000 identical heavy queries simultaneously. It locks up and dies.
Because the calculation takes 3 seconds, the cache stays empty for 3 seconds, exposing the database to the full fury of the internet.
How to Prevent a Cache Stampede
There are three primary strategies to defend against the dogpile.
1. Locking / Mutex (The Bouncer)
When a request experiences a cache miss, it doesn't immediately query the database. Instead, it tries to acquire a lock (e.g., a Redis distributed lock).
- The first request gets the lock, queries the database, and repopulates the cache.
- The other 4,999 requests fail to get the lock. Instead of hitting the database, they just pause and wait (sleep for 50ms) and then check the cache again.
- Analogy: Only one shopper is allowed into the manager's office. The rest must wait outside until that shopper returns with the answer.
2. Probabilistic Early Expiration (PER)
Also known as the XFetch algorithm. Instead of everyone seeing the cache expire at exactly 60 seconds, we add randomness.
As the cache gets closer to its expiration time, there is a randomly increasing mathematical probability that a request will "think" it's expired and rebuild it in the background, while the actual cache is still serving the slightly-stale data to everyone else. It prevents the sudden cliff-drop of expiration.
3. External Recomputation (The Background Worker)
The safest, most bulletproof method is to remove expiration entirely for critical hot-keys. You set the TTL to "never expire."
Instead of letting user requests trigger the database query, you have an asynchronous cron job or worker process running behind the scenes. This worker queries the database every 10 seconds and overwrites the Redis key.
- User traffic never touches the database for this specific data.
- Even if there's a spike of a million users, they just read the memory cache.
Summary
A Cache Stampede is a catastrophic failure mode where a highly-trafficked cache key expires, causing a flood of concurrent requests to bypass the empty cache and crush the underlying database. To survive this, systems must implement concurrency control using locking mechanisms, add randomized early expiration, or entirely decouple the cache rebuilding process into a safe, background worker.
Saved on this device only
Sign in to sync progress across devices