HyperLogLog

Updated June 3, 2026

Magic Magnets Team

6 min read

Imagine you run YouTube, and you want to display the number of unique viewers on the "Gangnam Style" video.

If you just increment a counter every time the video is played, you get total views. But to count unique views, you have to remember the user ID of every single person who watched it, so you don't count them twice.

If a billion unique people watch the video, storing a billion user IDs in a Hash Set would take gigabytes of RAM. Now multiply that by billions of videos on YouTube. It's an impossible amount of memory just to track a view count.

To solve this, system designers use an algorithm with a very silly name: HyperLogLog.

Quiz Time

Why is storing every unique user ID in a Hash Set impractical for platforms like YouTube?

The Coin Flipping Game

To understand how it works, imagine you are locked in a room. Outside the room, your friend flips a coin. They flip it until they get "Heads", and then they yell the number of times they flipped it to you.

"I flipped it 1 time!"
"I flipped it 3 times!"
"I flipped it 2 times!"

If your friend yells, "I flipped it 10 times!", you know they had to flip "Tails" 9 times in a row.

Statistically, getting 9 Tails in a row is incredibly rare (a 1 in 512 chance). So, just by hearing "10 times," you can confidently guess that your friend has been flipping the coin for a very, very long time.

You don't need to count every single flip. You just need to remember the longest streak of Tails they ever got, and you can mathematically estimate the total number of flips.

Quiz Time

In the coin-flipping analogy, what piece of information does HyperLogLog actually store?

Hashing is the Coin Flip

Computers don't flip coins, but they generate Hashes.

When a user watches a YouTube video, HyperLogLog takes their User ID and hashes it into a long string of 0s and 1s (like 01001101...).

Instead of looking for streaks of "Tails," HyperLogLog looks at the hash and counts the streak of leading zeros (e.g., 0000101... has 4 leading zeros).

Just like the coin game, seeing a hash with a huge streak of leading zeros is statistically rare.

For every unique User ID, HyperLogLog hashes it and counts the leading zeros.
It simply remembers the highest number of leading zeros it has ever seen.
Using that highest number, it can mathematically estimate how many unique users must have passed through the system to generate a hash that rare!

Quiz Time

HyperLogLog uses leading zeros in a hash value the same way the coin-flipping game uses a streak of Tails.

The Catch: It's an Estimate

HyperLogLog doesn't give you the exact number. It gives you an estimate. Usually, it is accurate within 2%.

For YouTube, if a video has 10,000,000 unique views, showing 10,100,000 is perfectly fine. The users won't care about a 1% margin of error, but the engineers care deeply that they just tracked 10 million users using only 12 Kilobytes of memory.

Let that sink in: HyperLogLog can estimate the unique count of billions of items using just a few kilobytes of RAM.

Quiz Time

What is the typical error margin of a HyperLogLog estimate?

Quiz Time

HyperLogLog is only useful when exact counts are not required.

Real-World Examples

Reddit & YouTube: Estimating unique views on posts and videos.
Google Analytics: Quickly estimating the number of unique daily visitors to a website without scanning a massive database table.
Redis: Has a built-in HyperLogLog data structure (PFADD and PFCOUNT commands) specifically for tracking unique metrics in real-time.

Quiz Time

Which Redis commands are used to work with its built-in HyperLogLog data structure?

Summary

HyperLogLog is an incredibly memory-efficient algorithm used for the "Count-Distinct Problem." Instead of storing every unique item to count them, it hashes the items and looks for statistical anomalies (like long streaks of zeros) to estimate the total count. It trades a tiny bit of accuracy (usually ~2% error) for massive, gigabyte-saving memory reductions, making it essential for big data analytics.

Count-Min Sketch

How helpful was this content?

Comments

0/2000

Saved on this device only