Erasure Coding
Updated June 3, 2026Let's say you have a highly confidential, top-secret document. You want to give it to your 5 most trusted friends. If you give a full copy to each friend, anyone of them could leak it.
What if, instead, you used a mathematical magic trick to chop the document into 5 pieces, plus 2 "math formula" pieces? You give one piece to each of 7 friends. The magic is this: any 5 friends can get together and perfectly reconstruct the original document. It doesn't matter which 5.
This mathematical magic trick is called Erasure Coding.
The Core Concept: Moving Beyond Simple Replication
Replication vs erasure coding — 3x vs 1.5x storage overhead
In distributed systems, hardware fails all the time. To prevent data loss, the easiest solution is Replication (copying the exact same data to multiple servers). If you replicate data 3 times (3x replication), you can survive 2 server failures. But storing 1 Petabyte of user data requires 3 Petabytes of hard drives!
Erasure Coding gives us fault tolerance similar to 3x replication, but uses a fraction of the storage space.
How it Works
4+2 erasure scheme — any 4 of 6 shards reconstruct the file
Erasure coding breaks data into data chunks and then calculates additional parity chunks using advanced algebra.
It's usually expressed as an equation: N = K + M
- K = The original data chunks
- M = The parity (extra math) chunks
- N = The total number of chunks distributed across servers
In the N = K + M formula for erasure coding, what does M represent?
If we use a 4 + 2 scheme:
- We split a 4MB file into four 1MB chunks (K=4).
- We calculate two 1MB parity chunks (M=2).
- We now have 6 chunks total, and distribute them to 6 different servers.
Here is the brilliant part: You only need ANY 4 of those 6 chunks to reconstruct the file. You can lose any 2 servers and your data is safe.
In erasure coding, if you use a 4+2 scheme and store chunks on 6 servers, how many servers can fail before data is lost?
The storage overhead here is only 1.5x, compared to 3.0x for simple 3-way replication!
What is the storage overhead of a 4+2 erasure coding scheme compared to the original data size?
Real-World Examples
- Amazon S3: Have you ever wondered how S3 offers "11 nines" (99.999999999%) of durability so cheaply? They heavily rely on erasure coding behind the scenes.
- Backblaze: The cloud backup company uses a 17+3 erasure coding scheme across their storage pods. They can lose 3 entire storage pods out of 20 and still rebuild all customer data.
- Hadoop (HDFS 3.0): HDFS introduced erasure coding to cut storage costs in half for "cold" data.
Amazon S3 achieves "11 nines" of durability partly by relying on erasure coding rather than simple replication.
The Trade-off: CPU and Network vs. Storage
Because the "math magic" requires compute power, reconstructing a lost chunk requires downloading surviving chunks over the network and running CPU-intensive mathematical operations.
[!WARNING] Erasure coding trades CPU cycles and network bandwidth during recovery for cheap storage during normal operations.
What is the main cost erasure coding imposes compared to simple replication when a chunk is lost and must be recovered?
It is typically used for cold storage (data rarely accessed) or massive objects. For hot databases needing sub-millisecond latency, simple replication is still king.
Erasure coding is generally preferred over replication for hot databases that require sub-millisecond read latency.
Summary
- Erasure coding provides fault tolerance with much lower storage overhead than replication.
- Data is broken into chunks, parity chunks are calculated, and they are spread across servers.
- It saves money on hard drives but increases CPU and network usage during data recovery.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices