Denormalization

Updated June 3, 2026

Magic Magnets Team

7 min read

If you took a database 101 class, you had Normalization drilled into your head. The golden rule of relational databases is: Do not repeat yourself (DRY). Every piece of data should live in exactly one place.

If you have an e-commerce store, your Orders table shouldn't contain the customer's name and email address. It should just contain a customer_id. To get the name, you JOIN the Orders table with the Customers table. This ensures that if the customer updates their email, you only have to change it in one row.

Normalization is beautiful, clean, and minimizes storage space.

But at a massive scale, Normalization can become your worst enemy.

The Problem with Joins

Joins are computationally expensive. When a database executes a join across three massive tables, it has to load data from different areas of the disk, match up the keys in memory, and construct a new combined view on the fly.

Imagine rendering a tweet on Twitter. To show a single tweet, a normalized database would need to join:

The Tweets table (for the text).
The Users table (for the author's display name and avatar).
The Likes table (to count how many likes it has).
The Retweets table (to count retweets).

If you are rendering a timeline of 50 tweets, doing those massive multi-table joins millions of times a second is impossible. The CPU will max out, and the database will crash.

To survive, system designers intentionally break the rules. They introduce Denormalization.

Quiz Time

Why do JOINs become a serious performance problem at massive scale?

What is Denormalization?

Denormalization is the process of intentionally adding redundant data to your database to improve read performance. You stop trying to save space, and you start optimizing for speed.

Instead of joining tables on the fly, you pre-bake the data together.

In our Twitter example, denormalization means altering the Tweets table so it looks like this:

tweet_id
text
author_id
author_display_name (Redundant!)
author_avatar_url (Redundant!)
like_count (Redundant!)
retweet_count (Redundant!)

Now, to render the timeline, the application executes a simple SELECT * FROM Tweets. Zero joins. The database just sequential-reads the exact data needed and hands it to the user in a fraction of a millisecond.

Quiz Time

Denormalization improves read performance by pre-baking related data into a single table, eliminating the need for JOINs at query time.

The Heavy Cost: Write Amplification and Complexity

Denormalization trades read complexity for write complexity. You are shifting the burden from read-time to write-time.

Because data is duplicated, keeping it consistent is a nightmare.

If a user changes their display name, in a normalized database, you update one row in the Users table. Easy. In our denormalized system, you have to update the Users table, AND you have to run a massive background job to update the author_display_name column on the 10,000 tweets that user has historically posted.

If someone likes a tweet, you can't just insert a row into the Likes table. You also have to increment the like_count integer on the Tweets table.

Quiz Time

What is "write amplification" in the context of denormalization?

Eventual Consistency

Often, systems accept that denormalized data will be slightly out of sync. When you change your Twitter display name, it might take a few minutes for your old tweets to show the new name. The system updates the core Users table immediately, and queues up a background worker to slowly scrub through the Tweets table and fix the redundant data. This is called Eventual Consistency.

Quiz Time

When a user changes their Twitter display name, the old tweets immediately reflect the new name in a denormalized system.

When to Denormalize

You should not denormalize your database on day one. It makes your application logic incredibly complex and error-prone.

You should only denormalize when:

Your system is incredibly Read-Heavy. The cost of the complex writes is outweighed by the massive savings on the millions of reads.
Caching isn't enough. You've tried putting Redis in front of the database, but the join queries are still too slow or complex.
You are moving to NoSQL. Databases like Cassandra, DynamoDB, and MongoDB don't support SQL joins natively. In NoSQL, denormalization isn't just an optimization. It's a strict requirement for the data model. You must structure your data exactly how the application intends to read it.

Quiz Time

Which of the following is a valid reason to denormalize a database?

Quiz Time

Denormalization is optional in NoSQL databases like Cassandra or DynamoDB.

Summary

Normalization prevents data duplication and makes updates easy, but requires slow, expensive JOIN operations to read.
Denormalization intentionally adds redundant, duplicated data to tables to eliminate joins and make reads blazingly fast.
It shifts the computational burden from read-time to write-time.
Updating denormalized data is complex and risky, often relying on background jobs and accepting Eventual Consistency.
It is a necessary evil for massive-scale systems and a fundamental requirement when designing schemas for NoSQL databases.

Materialized Views

How helpful was this content?

Comments

0/2000

Saved on this device only