Cache Invalidation
Updated June 8, 2026There is a famous quote in computer science by Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things."
We've learned that caches make systems incredibly fast. But they introduce a major headache: stale data.
When data exists in two places (the source-of-truth database and the fast in-memory cache), you have to keep them synchronized. If the database updates, but the cache doesn't know about it, your application will serve old, incorrect data to the user. The process of removing or updating this old data in the cache is called Cache Invalidation.
Why is it so hard?
It sounds simple on paper: "When you update the database, just delete the cache."
But in a distributed system, things get messy:
- What if the cache deletion command fails due to a network timeout?
- What if multiple servers are trying to update the cache at the exact same millisecond? (Race conditions).
- What if your cache stores a massive computed JSON object, and only one tiny property changed? Do you delete the whole thing?
The Analogy: The Outdated Menu
Imagine you run a restaurant. You print 100 menus (the Cache) and hand them out to customers to prevent them from constantly asking the kitchen (the Database) what's available.
Suddenly, you run out of salmon. The kitchen knows this. But the customers are holding menus that say salmon is available.
How do you fix this?
- Time-based: Tell customers, "Menus are only accurate for 5 minutes. Check the kitchen after that." (TTL)
- Event-based: Run into the dining room, snatch the menus out of their hands, and give them new ones. (Active Invalidation)
Strategy 1: Time-To-Live (TTL)
The simplest and most common form of cache invalidation isn't actually an active invalidation at all. It's just an expiration date.
When you write data to a cache like Redis, you attach a Time-To-Live (TTL) value, like 60 seconds. After 60 seconds, Redis automatically deletes the key. The next request will result in a cache miss, forcing the application to fetch fresh data from the database.
TTL expiry: when a key expires, the next request misses, fetches fresh data from the database, and repopulates the cache.
Pros:
- Incredibly easy to implement.
- Guarantees that the cache won't be stale forever.
Cons:
- During that 60-second window, users will see stale data.
- If a user updates their profile picture, and the old one stays on screen for a full minute, they might think the app is broken and try uploading it 5 more times.
Strategy 2: Write-Through (Active Update)
In this approach, the application doesn't just delete the cache; it updates it in real-time.
When a user updates their bio, the application:
- Writes the new bio to the database.
- Immediately overwrites the cache key with the new bio.
Pros:
- Data is always fresh. Users never see stale data.
Cons:
- Wastes resources if you write data to the cache that no one ends up reading.
- Prone to race conditions. If Request A and Request B update the same row concurrently, a network delay might cause Request B's database write to finish last, but Request A's cache write to finish last. Now the DB and Cache are completely out of sync.
Strategy 3: Active Deletion (Cache Aside)
Instead of updating the cache, the application just deletes it.
When the bio is updated:
- Update the database.
- Delete the cache key (e.g.,
DEL user:123).
The next time a read occurs, the system naturally experiences a cache miss and fetches the fresh data.
Active deletion: the app updates the database and immediately deletes the stale cache key. The next read fetches fresh data.
Pros:
- Simpler than Write-Through.
- Avoids the race condition problem mentioned above, because you are just purging the data, letting the next read handle the complex rebuilding.
Cons:
- Still vulnerable to network failures. If the
DELcommand fails to reach Redis, you have stale data. To mitigate this, always use this pattern alongside a sensible TTL as a safety net.
Advanced Pattern: Change Data Capture (CDC)
In very large systems, trusting the application code to perfectly remember to delete cache keys every time it writes to the DB is risky. Developers forget things.
Instead, systems use Change Data Capture tools like Debezium.
- The application just writes to the database. It doesn't talk to the cache at all.
- A separate background process tails the database's internal transaction log (like Postgres WAL or MySQL Binlog).
- When it sees an
UPDATE usersoccur in the log, this background worker automatically pushes a message to invalidate the Redis cache.
This decouples your application logic from your caching logic, ensuring that no matter how the database is updated (even by a manual DBA script), the cache will always be invalidated.
Summary
Cache invalidation is the delicate art of balancing performance with data freshness. If you don't care about a slight delay, rely on TTL expirations. If users need to see their updates instantly, use Active Deletion. For the most robust systems, combine Active Deletion with a background TTL safety net, or implement Change Data Capture to watch the database at the source.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices