Cuckoo Filter
Updated June 3, 2026Imagine you are building a massive web scraper, and you want to make sure you never visit the same URL twice. You could store every URL in a giant database or a Hash Set, but keeping a billion URLs in memory would cost a fortune in RAM.
You need a way to say, "Have I seen this URL before?" using almost zero memory.
For a long time, engineers used Bloom Filters for this. But Bloom Filters have a fatal flaw: you can add items to them, but you can never delete items. If a URL becomes obsolete, you're stuck with it.
Enter the Cuckoo Filter. It gives you the same memory-saving magic as a Bloom Filter, but it allows you to delete items!
The Bird that Steals Nests
The Cuckoo Filter gets its name from the Cuckoo bird. In nature, the Cuckoo doesn't build its own nest. Instead, it finds another bird's nest, kicks their eggs out, and lays its own eggs there.
A Cuckoo Filter works exactly like this.
Think of the filter as an array of "nests" (buckets). Each bucket can hold a small number of "eggs" (fingerprints).
When you want to add an item, like the URL google.com:
- You don't store the whole URL. You create a tiny hash of it, called a fingerprint (e.g.,
A1). - You run the URL through two different math functions to get two possible "nest" locations (e.g., Bucket 3 and Bucket 7).
- If Bucket 3 is empty, you put
A1there. You're done!
What does a Cuckoo Filter store in each bucket instead of the full item?
The Eviction Dance
But what happens if Bucket 3 and Bucket 7 are both completely full? This is where the Cuckoo bird attacks.
- You forcibly kick out an existing fingerprint from Bucket 3 (let's say fingerprint
B2). - You put your new fingerprint
A1into Bucket 3. - Now
B2is homeless! But wait. Every fingerprint has two possible locations. - The system calculates the alternate location for
B2(say, Bucket 9) and tries to put it there. - If Bucket 9 is full,
B2kicks out another fingerprint, and the cycle continues.
This chain reaction of kicking things out stops once an empty spot is found. It sounds chaotic, but mathematically, it resolves itself incredibly fast.
What happens when both candidate buckets for a new item are full?
Deleting is Easy
Because we store actual fingerprints in the buckets, deleting is trivial. If you want to remove google.com, you generate its fingerprint (A1), check its two possible buckets, find A1, and simply erase it.
Bloom Filters can't do this because they just flip bits to 1. If you flip a bit back to 0, you might accidentally delete other items that shared that bit!
Bloom Filters allow you to delete items just like Cuckoo Filters do.
The Catch: False Positives
Like Bloom Filters, Cuckoo Filters are probabilistic.
When you ask, "Have I seen this URL?", the filter checks its two buckets. If it sees the fingerprint A1, it says, "Probably yes."
Why probably? Because two completely different URLs might accidentally generate the same tiny fingerprint.
So, the rule of Cuckoo Filters is:
- False positives are possible (It might say you've seen a URL when you haven't).
- False negatives are impossible (If it says you haven't seen a URL, you absolutely haven't).
Which statement correctly describes the error guarantees of a Cuckoo Filter?
Real-World Examples
- Network Routers: Use Cuckoo Filters to quickly check if a packet of data matches a known malicious signature without storing the full signatures in fast, expensive router memory.
- Databases: Systems like Redis and modern databases use them to avoid expensive disk lookups. If the Cuckoo Filter says "this row doesn't exist," the database doesn't even bother checking the slow hard drive.
A database using a Cuckoo Filter consults the filter before hitting the disk. If the filter says an item does NOT exist, what should the database do?
Every fingerprint in a Cuckoo Filter has exactly two possible bucket locations.
Summary
A Cuckoo Filter is a hyper-efficient data structure for checking if an item exists in a massive dataset. By storing tiny fingerprints and using a clever "eviction" strategy to resolve collisions, it uses a fraction of the memory of a traditional Hash Set. Unlike Bloom filters, Cuckoo Filters allow you to dynamically add and delete items, making them the modern standard for fast, scalable existence checks.
Saved on this device only
Sign in to sync progress across devices