Count-Min Sketch

Updated June 3, 2026
M
Magic Magnets Team
6 min read

Imagine you are Twitter (or X), and during the Super Bowl, millions of tweets are flying by every second. You want to keep a real-time leaderboard of the "Top 10 Trending Hashtags."

If you try to keep a counter for every single hashtag that gets tweeted, you'll run out of memory fast. Millions of unique, random hashtags are generated every day (like #MyCatAteMyShoe2026). You don't care about the rare ones; you only care about the heavy hitters like #SuperBowl.

But how do you count the popular things without wasting memory counting the unpopular things? You use a Count-Min Sketch.

The Shared Chalkboard

Think of a Count-Min Sketch as a small grid on a chalkboard. It has a few rows and several columns. Every cell in the grid starts at zero.

When a hashtag comes in, like #SuperBowl, we don't store the word "SuperBowl." Instead, we do something clever:

  1. We take #SuperBowl and run it through three different Hash Functions.
  2. The first hash says "Row 1, Column 4". We go to that cell and add 1.
  3. The second hash says "Row 2, Column 7". We go to that cell and add 1.
  4. The third hash says "Row 3, Column 2". We go to that cell and add 1.

We do this for every single hashtag that passes through the system.

Quiz Time

What does a Count-Min Sketch store when an item (e.g., a hashtag) arrives?

Retrieving the Count

Ten minutes later, you want to know how many times #SuperBowl was tweeted. You simply run #SuperBowl through those exact same three Hash Functions. They point you back to the exact same three cells:

  • Row 1, Column 4 has the number 15,000.
  • Row 2, Column 7 has the number 15,050.
  • Row 3, Column 2 has the number 15,000.

Why are the numbers slightly different? Because other random hashtags might have accidentally hashed into cell (Row 2, Column 7) and added to its total! This is called a collision.

Because collisions only ever add to the number, the true count of #SuperBowl can never be higher than the lowest number among the three cells.

So, you take the minimum of the three counts (15,000). Hence the name: Count-Min Sketch.

Quiz Time

When querying the estimated count of an item, which value do you use from the grid?

Quiz Time

A Count-Min Sketch can undercount the true frequency of an item.

Why is this brilliant?

If a random hashtag like #MyCatAteMyShoe comes in, it hashes to three random cells. It bumps their numbers up. But who cares? We never ask for the count of that hashtag anyway!

The heavy hitters (the viral hashtags) will continually bombard their specific cells, driving the numbers up massively. The random noise of unpopular hashtags just adds a tiny bit of fluff to the background.

You get to track the frequency of highly recurring events using a fixed, tiny amount of memory, regardless of how many unique items pass through the system.

Quiz Time

What is the main trade-off a Count-Min Sketch makes compared to an exact counter like a hash map?

Real-World Examples

  • Twitter: Identifying trending topics and viral hashtags in real-time streams.
  • Network Routers: ISPs use them to detect DDoS attacks by finding the IP addresses that are sending disproportionately massive amounts of traffic, without tracking every single IP on the internet.
  • Recommendation Systems: Quickly finding the most frequent searches or clicks on an e-commerce platform.
Quiz Time

Using more independent hash functions (more rows) in a Count-Min Sketch reduces the chance that all rows suffer a collision for the same query.

Summary

Count-Min Sketch is a probabilistic data structure used to estimate the frequency of events in a massive data stream. By hashing items into a fixed-size 2D array of counters, it avoids storing the items themselves. When retrieving a count, it takes the minimum value of the item's hashed counters to minimize the impact of collisions. It trades exact accuracy for incredible memory efficiency, making it perfect for tracking "Heavy Hitters" and trending data.

MinHash

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices