Document Databases

Updated June 3, 2026

Magic Magnets Team

9 min read

Document databases are probably the most popular NoSQL choice in modern web development, and also the most misused. MongoDB (and Firestore, and CouchDB) solve real problems brilliantly. But the "just use MongoDB, it's flexible" mentality has led to a lot of painful data models that developers regret later.

Let's understand what document databases actually do well, when to reach for them, and where they'll burn you if you're not careful.

What Is a Document?

In a document database, the basic unit of storage is a document — a self-contained JSON (or BSON in MongoDB's case) object. There are no fixed schemas. Documents in the same collection can have completely different shapes.

Here's an example: two product documents in an e-commerce catalog.

// A t-shirt
{
  "_id": "prod-001",
  "name": "Classic Tee",
  "category": "apparel",
  "sizes": ["XS", "S", "M", "L", "XL"],
  "colors": ["black", "white", "navy"],
  "price": 29.99
}

// A laptop
{
  "_id": "prod-002",
  "name": "ProBook 14",
  "category": "electronics",
  "specs": {
    "ram": "16GB",
    "storage": "512GB SSD",
    "display": "14-inch 2K"
  },
  "warranty_years": 2,
  "price": 1299.99
}

These two documents have completely different fields. In a relational database, you'd need to either have a table with dozens of nullable columns (messy) or a complex EAV (Entity-Attribute-Value) pattern (even messier). In a document database, this is just natural.

Quiz Time

What is the main advantage of storing product data in a document database over a relational database when product types have very different attributes?

Flexible Schema: A Double-Edged Sword

The schema-less nature of document databases is genuinely useful in some scenarios:

Product catalogs where different product types have radically different attributes
User profiles that evolve over time — you add preferences, then theme, then notification_settings without migrations
CMS content where articles, events, and pages all have different fields

But "schema-less" doesn't mean "schema-free." It means the schema is enforced by your application code instead of the database. And application-enforced schemas are harder to maintain, harder to validate consistently, and easier to break across services.

With a relational database, the schema is a contract. With a document database, you have to write your own contract — and keep all your services in sync on it.

Modern document databases have caught on to this. MongoDB has JSON Schema validation. Firestore has security rules that can validate field types. Use them.

Quiz Time

In a document database, "schema-less" means the database does not enforce any structure and your data has no schema.

Embedding vs Referencing: The Core Design Decision

In a relational database, you normalize: related data lives in separate tables, linked by foreign keys. In a document database, you have a choice.

Embedding (Denormalizing)

You store related data directly inside the parent document.

{
  "_id": "order-001",
  "user_id": "user-42",
  "items": [
    { "product_id": "prod-001", "name": "Classic Tee", "quantity": 2, "price": 29.99 },
    { "product_id": "prod-002", "name": "ProBook 14", "quantity": 1, "price": 1299.99 }
  ],
  "total": 1359.97,
  "status": "shipped"
}

When to embed: The embedded data is always accessed together with the parent. One-to-few relationships. The embedded array won't grow unboundedly. Think: order items, comment threads, user addresses.

Referencing (Normalizing)

You store a reference (ID) to the related document, similar to a foreign key.

{
  "_id": "post-001",
  "title": "My First Post",
  "author_id": "user-42",   // reference, not embedded
  "comment_ids": ["comment-1", "comment-2"]
}

When to reference: The related data is large, changes frequently, or is shared across multiple parents. Many-to-many relationships. Think: authors of posts, products in wishlists, users in groups.

The trade-off: embedding is fast (single read), but duplicates data. Referencing avoids duplication but requires multiple reads or application-side joins.

Quiz Time

You are designing a blog platform. A post document has a comments array that you expect to grow to thousands of entries over time. What is the recommended approach?

When Document Databases Shine

Product Catalogs

Arguably the best use case. When each product type has completely different attributes — a book has an ISBN and page count; a laptop has RAM and storage; a t-shirt has sizes and colors — a rigid relational schema is painful. Documents are natural.

User Profiles

A user profile document that starts as { name, email } and grows to include preferences, social connections, notification settings, and theme options is a perfect fit. You evolve the document without table migrations.

Content Management Systems

Blog posts, landing pages, events, and announcements each have different fields. Storing them as documents lets you iterate on content structure quickly. Contentful and Sanity (modern headless CMSes) are essentially document databases with a nice UI on top.

Real-Time Collaborative Applications

Firestore's real-time listeners make it incredibly easy to build collaborative features — a live document editor, a shared todo list, a multi-user dashboard — where changes propagate instantly to all connected clients.

Real Databases

MongoDB is the most mature and widely deployed document database. It supports rich querying, aggregation pipelines, multi-document ACID transactions (added in v4.0), and a massive ecosystem. Atlas (the managed cloud offering) is excellent. If you're choosing a document database for a general-purpose backend, MongoDB is the default.

Firestore (part of Firebase) is Google's managed, serverless document database. It has real-time sync built in, scales to zero, and integrates seamlessly with Firebase Auth and Cloud Functions. The data model has a "collections and subcollections" hierarchy that maps nicely to hierarchical data. It's an excellent choice for mobile and web apps that want a zero-ops backend.

CouchDB has a fascinating offline-first sync model (CouchDB replication) and uses HTTP/JSON as its native API. It's not as commonly used today but is worth knowing for offline-capable apps.

Quiz Time

MongoDB added multi-document ACID transaction support in which major version?

Pitfalls: What Nobody Tells You

No Native Joins

This is the biggest one. Document databases don't support joins at the database layer. If your data is deeply relational — orders referencing products referencing categories referencing suppliers — you're either embedding everything (fat documents) or making multiple round-trips from application code.

MongoDB has $lookup for aggregation pipelines, which is essentially a join, but it's not as expressive or performant as SQL joins on indexed columns.

Consistency Trade-offs

MongoDB defaulted to eventual consistency for years before adding stronger guarantees. Firestore has "strong consistency" within a document but only "eventual consistency" across collections in some query patterns. If you need ACID transactions across multiple documents, you can get them in modern MongoDB, but it's not the default and comes with performance costs.

Quiz Time

Firestore guarantees strong consistency for all cross-collection query patterns.

Unbounded Arrays

Embedding is tempting, but arrays inside documents can grow without bound. A post document with an embedded comments array is fine for 10 comments. At 10,000 comments, it's a different story — the document becomes gigabytes large, slow to read, and slow to update. Design with growth in mind.

Schema Drift

Without schema enforcement, different parts of your codebase start writing slightly different document shapes. A user created by the mobile app might look different from one created by the web app. This is manageable with discipline and tooling (Mongoose schemas, Zod validation, Firestore security rules), but it requires intentional effort.

Embedding Decision Cheat Sheet

Relationship	Access Pattern	Recommendation
Post → author	Usually accessed separately	Reference
Order → line items	Always together	Embed
User → addresses (few)	Together	Embed
Post → comments (unbounded)	Sometimes separate	Reference
Product → category	Category shared across many	Reference
User → settings	Always together	Embed

Quiz Time

Which of the following relationships is the best candidate for embedding rather than referencing in a document database?

Summary

Document databases store self-contained JSON documents with flexible schemas — ideal for hierarchical data, product catalogs, user profiles, and CMS content. The core design decision is embedding (fast, but denormalized) vs referencing (normalized, but requires multiple reads). MongoDB is the default choice for general-purpose document storage; Firestore is excellent for real-time, serverless mobile and web backends. The pitfalls are real: no native joins forces you to think carefully about your access patterns upfront, unbounded arrays create performance problems, and schema drift across services requires active management. Use document databases where flexibility genuinely helps — not as a default because SQL "seems complicated."

Key-Value Stores

How helpful was this content?

Comments

0/2000

Saved on this device only