Erasure Coding

Updated June 3, 2026
M
Magic Magnets Team
7 min read

Let's say you have a highly confidential, top-secret document. You want to give it to your 5 most trusted friends. If you give a full copy to each friend, anyone of them could leak it.

What if, instead, you used a mathematical magic trick to chop the document into 5 pieces, plus 2 "math formula" pieces? You give one piece to each of 7 friends. The magic is this: any 5 friends can get together and perfectly reconstruct the original document. It doesn't matter which 5.

This mathematical magic trick is called Erasure Coding.

The Core Concept: Moving Beyond Simple Replication

algobase.dev
Erasure coding and replication both protect against node failures, but with very different storage costs. Simple 3x replication stores three complete copies of every byte: 1 TB of data requires 3 TB of disk, and you can survive two simultaneous node failures. It's simple and recovery is fast (just copy from a surviving replica), but the 3x overhead becomes enormously expensive at petabyte scale. Erasure coding (specifically a 4+2 scheme here) achieves the same fault tolerance — surviving any two node failures — with only 1.5x overhead. For a company storing 1 exabyte, the difference between 3.0x and 1.5x replication factor is 500 petabytes of hardware. This is why S3, Backblaze, and HDFS use erasure coding for the bulk of their stored data.
1 / 1

Replication vs erasure coding — 3x vs 1.5x storage overhead

In distributed systems, hardware fails all the time. To prevent data loss, the easiest solution is Replication (copying the exact same data to multiple servers). If you replicate data 3 times (3x replication), you can survive 2 server failures. But storing 1 Petabyte of user data requires 3 Petabytes of hard drives!

Erasure Coding gives us fault tolerance similar to 3x replication, but uses a fraction of the storage space.

How it Works

algobase.dev
A 4+2 erasure code (expressed as N=K+M, where K=4 data and M=2 parity) works as follows: a 4 MB file is split into four 1 MB data shards. Using Galois field arithmetic, two 1 MB parity shards are computed — mathematical checksums over the data. All six shards are distributed to six different storage nodes. The key property: any four of the six shards are sufficient to reconstruct the original file, regardless of which four you have. You can lose any two nodes simultaneously and still recover everything. Recovery does cost CPU (running the decoding algorithm) and network bandwidth (downloading surviving shards to reconstruct the lost one) — this is the trade-off vs replication. Erasure coding is used for cold or warm data; hot data with frequent reads often still uses replication to avoid reconstruction latency.
1 / 1

4+2 erasure scheme — any 4 of 6 shards reconstruct the file

Erasure coding breaks data into data chunks and then calculates additional parity chunks using advanced algebra.

It's usually expressed as an equation: N = K + M

  • K = The original data chunks
  • M = The parity (extra math) chunks
  • N = The total number of chunks distributed across servers
Quiz Time

In the N = K + M formula for erasure coding, what does M represent?

If we use a 4 + 2 scheme:

  1. We split a 4MB file into four 1MB chunks (K=4).
  2. We calculate two 1MB parity chunks (M=2).
  3. We now have 6 chunks total, and distribute them to 6 different servers.

Here is the brilliant part: You only need ANY 4 of those 6 chunks to reconstruct the file. You can lose any 2 servers and your data is safe.

Quiz Time

In erasure coding, if you use a 4+2 scheme and store chunks on 6 servers, how many servers can fail before data is lost?

The storage overhead here is only 1.5x, compared to 3.0x for simple 3-way replication!

Quiz Time

What is the storage overhead of a 4+2 erasure coding scheme compared to the original data size?

Real-World Examples

  • Amazon S3: Have you ever wondered how S3 offers "11 nines" (99.999999999%) of durability so cheaply? They heavily rely on erasure coding behind the scenes.
  • Backblaze: The cloud backup company uses a 17+3 erasure coding scheme across their storage pods. They can lose 3 entire storage pods out of 20 and still rebuild all customer data.
  • Hadoop (HDFS 3.0): HDFS introduced erasure coding to cut storage costs in half for "cold" data.
Quiz Time

Amazon S3 achieves "11 nines" of durability partly by relying on erasure coding rather than simple replication.

The Trade-off: CPU and Network vs. Storage

Because the "math magic" requires compute power, reconstructing a lost chunk requires downloading surviving chunks over the network and running CPU-intensive mathematical operations.

[!WARNING] Erasure coding trades CPU cycles and network bandwidth during recovery for cheap storage during normal operations.

Quiz Time

What is the main cost erasure coding imposes compared to simple replication when a chunk is lost and must be recovered?

It is typically used for cold storage (data rarely accessed) or massive objects. For hot databases needing sub-millisecond latency, simple replication is still king.

Quiz Time

Erasure coding is generally preferred over replication for hot databases that require sub-millisecond read latency.

Summary

  • Erasure coding provides fault tolerance with much lower storage overhead than replication.
  • Data is broken into chunks, parity chunks are calculated, and they are spread across servers.
  • It saves money on hard drives but increases CPU and network usage during data recovery.

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices