Cuckoo Filter

Updated June 3, 2026
M
Magic Magnets Team
6 min read

Imagine you are building a massive web scraper, and you want to make sure you never visit the same URL twice. You could store every URL in a giant database or a Hash Set, but keeping a billion URLs in memory would cost a fortune in RAM.

You need a way to say, "Have I seen this URL before?" using almost zero memory.

For a long time, engineers used Bloom Filters for this. But Bloom Filters have a fatal flaw: you can add items to them, but you can never delete items. If a URL becomes obsolete, you're stuck with it.

Enter the Cuckoo Filter. It gives you the same memory-saving magic as a Bloom Filter, but it allows you to delete items!

The Bird that Steals Nests

The Cuckoo Filter gets its name from the Cuckoo bird. In nature, the Cuckoo doesn't build its own nest. Instead, it finds another bird's nest, kicks their eggs out, and lays its own eggs there.

A Cuckoo Filter works exactly like this.

Think of the filter as an array of "nests" (buckets). Each bucket can hold a small number of "eggs" (fingerprints).

When you want to add an item, like the URL google.com:

  1. You don't store the whole URL. You create a tiny hash of it, called a fingerprint (e.g., A1).
  2. You run the URL through two different math functions to get two possible "nest" locations (e.g., Bucket 3 and Bucket 7).
  3. If Bucket 3 is empty, you put A1 there. You're done!
Quiz Time

What does a Cuckoo Filter store in each bucket instead of the full item?

The Eviction Dance

But what happens if Bucket 3 and Bucket 7 are both completely full? This is where the Cuckoo bird attacks.

  1. You forcibly kick out an existing fingerprint from Bucket 3 (let's say fingerprint B2).
  2. You put your new fingerprint A1 into Bucket 3.
  3. Now B2 is homeless! But wait. Every fingerprint has two possible locations.
  4. The system calculates the alternate location for B2 (say, Bucket 9) and tries to put it there.
  5. If Bucket 9 is full, B2 kicks out another fingerprint, and the cycle continues.

This chain reaction of kicking things out stops once an empty spot is found. It sounds chaotic, but mathematically, it resolves itself incredibly fast.

Quiz Time

What happens when both candidate buckets for a new item are full?

Deleting is Easy

Because we store actual fingerprints in the buckets, deleting is trivial. If you want to remove google.com, you generate its fingerprint (A1), check its two possible buckets, find A1, and simply erase it.

Bloom Filters can't do this because they just flip bits to 1. If you flip a bit back to 0, you might accidentally delete other items that shared that bit!

Quiz Time

Bloom Filters allow you to delete items just like Cuckoo Filters do.

The Catch: False Positives

Like Bloom Filters, Cuckoo Filters are probabilistic. When you ask, "Have I seen this URL?", the filter checks its two buckets. If it sees the fingerprint A1, it says, "Probably yes."

Why probably? Because two completely different URLs might accidentally generate the same tiny fingerprint.

So, the rule of Cuckoo Filters is:

  • False positives are possible (It might say you've seen a URL when you haven't).
  • False negatives are impossible (If it says you haven't seen a URL, you absolutely haven't).
Quiz Time

Which statement correctly describes the error guarantees of a Cuckoo Filter?

Real-World Examples

  • Network Routers: Use Cuckoo Filters to quickly check if a packet of data matches a known malicious signature without storing the full signatures in fast, expensive router memory.
  • Databases: Systems like Redis and modern databases use them to avoid expensive disk lookups. If the Cuckoo Filter says "this row doesn't exist," the database doesn't even bother checking the slow hard drive.
Quiz Time

A database using a Cuckoo Filter consults the filter before hitting the disk. If the filter says an item does NOT exist, what should the database do?

Quiz Time

Every fingerprint in a Cuckoo Filter has exactly two possible bucket locations.

Summary

A Cuckoo Filter is a hyper-efficient data structure for checking if an item exists in a massive dataset. By storing tiny fingerprints and using a clever "eviction" strategy to resolve collisions, it uses a fraction of the memory of a traditional Hash Set. Unlike Bloom filters, Cuckoo Filters allow you to dynamically add and delete items, making them the modern standard for fast, scalable existence checks.

HyperLogLog

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices