Data Compression

Updated June 3, 2026

Magic Magnets Team

8 min read

Have you ever tried to stuff an oversized sleeping bag into a tiny stuff sack? It takes effort, but once it's in, it's much easier to carry around. That's data compression in a nutshell.

In system design, moving data around the network is expensive, both in terms of time and actual dollars. Storage is cheaper than it used to be, but it's still not free. That's why we compress data.

The Core Concept: Why Compress?

Think of it this way: if you have a message that says "AAAAABBBBB", why send those 10 characters when you could just say "5A5B"? You've just reduced the size of your message by 60%.

When we talk about data compression in large-scale systems, we're usually making a trade-off. We are trading CPU cycles (the compute power needed to compress and decompress) for network bandwidth and storage space.

[!NOTE] CPU is often much faster than network I/O. Taking a few milliseconds of CPU time to compress a payload can save hundreds of milliseconds of network transfer time.

Quiz Time

What fundamental trade-off does data compression make in a large-scale system?

Two Main Flavors: Lossless vs. Lossy

1. Lossless Compression

With lossless compression, when you decompress the data, you get the exact original data back. Not a single bit is out of place.

Where it's used: Text files, database records, configuration files, executables.
Real-world examples:
- Google uses Brotli and Gzip to compress web assets (HTML, CSS, JS) before sending them to your browser.
- Databases like Cassandra or PostgreSQL use algorithms like LZ4 or Snappy to compress blocks of data on disk.

Quiz Time

Which of the following is the most appropriate use case for lossless compression?

2. Lossy Compression

With lossy compression, the decompressed data is a close approximation of the original, but some of the less important details are thrown away forever.

Where it's used: Images, video, audio.
Real-world examples:
- Netflix uses highly advanced lossy video compression. When you watch a movie, you aren't seeing every single original pixel from the master copy. The compression algorithm throws away data that the human eye barely notices.
- Spotify compresses audio streams so you can listen seamlessly on a mobile connection.

Quiz Time

When Netflix streams a movie, the video you watch is a pixel-perfect copy of the original master file.

Common Compression Algorithms You Should Know

If you're building a system, which algorithm should you choose? It depends on what you value most.

Algorithm	Speed	Compression Ratio	Best For
Gzip	Medium	Good	Web traffic, general purpose file compression
LZ4	Very Fast	Low	High-throughput systems, databases, real-time logging
Snappy	Very Fast	Low	Developed by Google; used in MapReduce, BigTable, Cassandra
Zstandard (Zstd)	Fast	Very Good	Developed by Facebook; arguably the best modern general-purpose compressor
Brotli	Slow (Encode) / Fast (Decode)	Excellent	Static web assets where you encode once and decode millions of times

Quiz Time

A high-throughput logging pipeline needs to compress billions of events per second with minimal CPU overhead. Which algorithm is the best fit?

Quiz Time

Brotli is a good choice for compressing real-time API responses because it has very fast encoding speed.

Summary

Compression trades CPU cycles for storage space and network bandwidth.
Lossless preserves every bit. Lossy sacrifices detail for massive space savings.
Algorithms like LZ4 and Snappy prioritize speed, making them great for databases.
Don't bother compressing data that's already compressed (like media files).

Block vs File vs Object Storage

How helpful was this content?

Comments

0/2000

Saved on this device only