Data Compression

Updated June 3, 2026
M
Magic Magnets Team
8 min read

Have you ever tried to stuff an oversized sleeping bag into a tiny stuff sack? It takes effort, but once it's in, it's much easier to carry around. That's data compression in a nutshell.

In system design, moving data around the network is expensive, both in terms of time and actual dollars. Storage is cheaper than it used to be, but it's still not free. That's why we compress data.

The Core Concept: Why Compress?

Think of it this way: if you have a message that says "AAAAABBBBB", why send those 10 characters when you could just say "5A5B"? You've just reduced the size of your message by 60%.

When we talk about data compression in large-scale systems, we're usually making a trade-off. We are trading CPU cycles (the compute power needed to compress and decompress) for network bandwidth and storage space.

[!NOTE] CPU is often much faster than network I/O. Taking a few milliseconds of CPU time to compress a payload can save hundreds of milliseconds of network transfer time.

Quiz Time

What fundamental trade-off does data compression make in a large-scale system?

Two Main Flavors: Lossless vs. Lossy

1. Lossless Compression

With lossless compression, when you decompress the data, you get the exact original data back. Not a single bit is out of place.

  • Where it's used: Text files, database records, configuration files, executables.
  • Real-world examples:
    • Google uses Brotli and Gzip to compress web assets (HTML, CSS, JS) before sending them to your browser.
    • Databases like Cassandra or PostgreSQL use algorithms like LZ4 or Snappy to compress blocks of data on disk.
Quiz Time

Which of the following is the most appropriate use case for lossless compression?

2. Lossy Compression

With lossy compression, the decompressed data is a close approximation of the original, but some of the less important details are thrown away forever.

  • Where it's used: Images, video, audio.
  • Real-world examples:
    • Netflix uses highly advanced lossy video compression. When you watch a movie, you aren't seeing every single original pixel from the master copy. The compression algorithm throws away data that the human eye barely notices.
    • Spotify compresses audio streams so you can listen seamlessly on a mobile connection.
Quiz Time

When Netflix streams a movie, the video you watch is a pixel-perfect copy of the original master file.

Common Compression Algorithms You Should Know

If you're building a system, which algorithm should you choose? It depends on what you value most.

AlgorithmSpeedCompression RatioBest For
GzipMediumGoodWeb traffic, general purpose file compression
LZ4Very FastLowHigh-throughput systems, databases, real-time logging
SnappyVery FastLowDeveloped by Google; used in MapReduce, BigTable, Cassandra
Zstandard (Zstd)FastVery GoodDeveloped by Facebook; arguably the best modern general-purpose compressor
BrotliSlow (Encode) / Fast (Decode)ExcellentStatic web assets where you encode once and decode millions of times
Quiz Time

A high-throughput logging pipeline needs to compress billions of events per second with minimal CPU overhead. Which algorithm is the best fit?

Quiz Time

Brotli is a good choice for compressing real-time API responses because it has very fast encoding speed.

Summary

  • Compression trades CPU cycles for storage space and network bandwidth.
  • Lossless preserves every bit. Lossy sacrifices detail for massive space savings.
  • Algorithms like LZ4 and Snappy prioritize speed, making them great for databases.
  • Don't bother compressing data that's already compressed (like media files).

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices