Indexing

Updated June 3, 2026
M
Magic Magnets Team
9 min read

Without an index, a database has one option when you query it: read every single row until it finds what you're looking for. For a table with a thousand rows, that's fine. For a table with 500 million rows, that's catastrophically slow. Indexes are the solution, and understanding how they work is one of the highest-leverage things you can know about databases.

What Is an Index?

An index is a separate data structure the database maintains alongside your table. It stores a subset of your columns in a form that's fast to search, plus a pointer back to the full row on disk. Instead of scanning all 500 million rows, the database goes to the index, finds the matching entries in O(log n) time, then fetches just those rows.

Think of it like the index at the back of a textbook. You don't re-read the whole book to find where "consistent hashing" appears — you flip to the index, see page 247, and go directly there.

The tradeoff: every index you create takes up storage and must be updated every time you insert, update, or delete a row.

B-Tree Indexes: The Workhorse

The overwhelming majority of indexes in relational databases (PostgreSQL, MySQL, SQLite) are B-Tree indexes. A B-Tree is a self-balancing tree structure where:

  • Every leaf node holds actual data pointers sorted by the indexed value
  • Interior nodes hold separator keys to guide search
  • The tree stays balanced, so searches always take O(log n) comparisons
  • Leaf nodes are linked together, making range scans efficient

When you run SELECT * FROM users WHERE email = 'alice@example.com', the database traverses the B-Tree from root to leaf in a handful of comparisons — even across billions of rows. When you run WHERE created_at BETWEEN '2024-01-01' AND '2024-06-01', it finds the start of the range and follows the linked leaf nodes sequentially.

PostgreSQL also has Hash indexes (good for exact equality, bad for ranges), GIN indexes (for full-text search and JSONB), and GiST indexes (for geometric data). But B-Tree is the default for good reason — it's a Swiss Army knife that handles most query shapes well.

Quiz Time

What is the time complexity for a lookup in a B-Tree index?

Index Types You Should Know

Composite Indexes

A composite index covers multiple columns: CREATE INDEX idx_orders_user_date ON orders(user_id, created_at).

The key insight is column order matters. This index can efficiently support:

  • WHERE user_id = 5
  • WHERE user_id = 5 AND created_at > '2024-01-01'

But it cannot efficiently support:

  • WHERE created_at > '2024-01-01' (without user_id)

The leftmost prefix rule: a composite index on (A, B, C) can serve queries on A alone, A+B, or A+B+C. Not B alone or C alone.

Quiz Time

Given a composite index on (user_id, created_at), which query can use this index efficiently?

Covering Indexes

A covering index includes all the columns a query needs — so the database never has to go back to the main table at all.

-- Query: SELECT user_id, email FROM users WHERE status = 'active'; -- Covering index: CREATE INDEX idx_users_status_covering ON users(status) INCLUDE (user_id, email);

The INCLUDE columns don't participate in the B-Tree structure, but they get stored in the leaf nodes. The database reads the index and has everything it needs. This is sometimes called an index-only scan and is dramatically faster for read-heavy workloads.

Quiz Time

A covering index allows the database to answer a query without touching the main table.

Partial Indexes

A partial index only indexes rows matching a condition:

CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

If 99% of your orders are completed and you only ever query pending ones, why index all 500 million rows? A partial index stays small, fits in memory, and is extremely fast for its specific use case.

Quiz Time

Which of the following index types is best suited for indexing only the small subset of rows that match a specific condition?

Unique Indexes

CREATE UNIQUE INDEX does double duty: it enforces a uniqueness constraint and gives you a fast lookup. Most databases create a unique index automatically when you define a UNIQUE constraint or PRIMARY KEY.

The Cost of Indexes

Here's what textbooks often gloss over: indexes aren't free.

Write Overhead

Every INSERT, UPDATE, or DELETE must update every index on the table. If a table has 8 indexes, an insert touches 9 data structures instead of 1. For write-heavy workloads (logging, event ingestion, high-frequency trading), too many indexes can make writes painfully slow.

This is why data warehouse systems often drop indexes before bulk loads and rebuild them after. Inserting 100 million rows without indexes and then building the index once is orders of magnitude faster than maintaining the index during every insert.

Quiz Time

Bulk-loading data into a table is faster if you drop its indexes first and rebuild them after.

Storage

Indexes can easily double or triple the storage footprint of a table. That's usually acceptable, but it adds up in large systems. If you're storing index data on SSDs at cloud pricing, those bytes cost money.

Index Bloat

B-Trees can develop "bloat" over time — deleted rows leave dead pages in the tree that don't get immediately reclaimed. PostgreSQL's VACUUM and REINDEX operations help, but this is an operational concern you need to plan for at scale.

Common Pitfalls

Too Many Indexes

Every index you add is a bet that the read performance gain is worth the write overhead. Junior engineers often add indexes reactively — one for each slow query they encounter — until the table has 15 indexes and inserts are grinding.

Be deliberate. Profile what queries actually run in production. Drop indexes that haven't been used in months (PostgreSQL tracks this in pg_stat_user_indexes).

Indexes on Low-Cardinality Columns

An index on a gender column with two possible values isn't helpful. The database still has to fetch half the table — at that point, a full table scan might actually be faster because it avoids the overhead of following index pointers.

Rule of thumb: indexes pay off when they help the database skip at least 80-90% of rows.

Quiz Time

Why is adding an index on a low-cardinality column like `status` with only two values often a bad idea?

Missing Indexes on Foreign Keys

This is a classic performance trap. If orders.user_id references users.id, but you forgot to index orders.user_id, then any query joining orders to users by user does a full scan of the orders table for every user. MySQL actually warns you about this; PostgreSQL does not.

Implicit Type Casts Break Indexes

If your column is INTEGER and you query WHERE user_id = '42' (a string), many databases will cast every value in the index to compare — defeating the index entirely. Keep your query types consistent with your column types.

Summary

Indexes are one of the most powerful tools in a database's toolkit, and also one of the most misunderstood. The core idea is simple: maintain a sorted, searchable data structure so you can find rows without scanning everything. B-Tree indexes handle the vast majority of use cases. Composite, covering, and partial indexes are refinements that let you tune for specific query shapes and access patterns.

The discipline is knowing when not to add an index — because every index you create makes writes slower and consumes storage. Profile real query patterns, understand the leftmost prefix rule, and treat indexes as deliberate architectural decisions rather than reflexive reactions to slow queries.

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices