Query Optimization

Updated June 3, 2026
M
Magic Magnets Team
9 min read

When you write a SQL query like SELECT * FROM users JOIN orders ON users.id = orders.user_id WHERE users.country = 'USA', you are describing what data you want. You are absolutely not telling the database how to get it.

SQL is a declarative language. The database engine has to figure out the physical steps to execute your request. Should it look at the users table first? Or the orders table? Should it use an index? Should it scan the whole table?

The brains behind these decisions is the Query Optimizer. It is arguably the most complex and critical piece of software inside any relational database.

The Role of the Query Optimizer

Think of the query optimizer like a GPS navigation system (like Google Maps).

You tell the GPS: "I want to go from New York to Boston." The GPS doesn't just draw a straight line. It looks at the map, calculates multiple possible routes, checks the current traffic, notes which roads have tolls, and finally selects the route that will get you there the fastest.

The Query Optimizer does the exact same thing for your data. It generates multiple "Execution Plans," estimates the cost of each plan, and chooses the cheapest one.

Quiz Time

What is the primary job of a Query Optimizer in a relational database?

How It Calculates "Cost"

In database terms, "Cost" is usually a mathematical proxy for Disk I/O (how many times it has to read from the hard drive) and CPU usage.

To estimate the cost, the optimizer relies heavily on Statistics. The database constantly runs background tasks to gather statistics about your tables. It knows:

  • How many rows are in the users table?
  • How many distinct countries are in the country column?
  • Are the countries evenly distributed, or are 90% of your users from the USA?

The Tipping Point: Index Scan vs. Sequential Scan

Let's look at a classic optimization decision. You have an index on the country column, and you run: SELECT * FROM users WHERE country = 'Canada'

If the statistics show that only 2% of your users are from Canada, the optimizer will say, "Great, I'll use the B-Tree index to quickly find those specific rows." This is an Index Scan.

But what if you run: SELECT * FROM users WHERE country = 'USA'

If the statistics know that 90% of your users are from the USA, the optimizer will completely ignore your carefully created index! Why? Because using the index requires jumping back and forth between the index structure and the main table structure on disk. If you have to read 90% of the table anyway, it is actually much faster to just read the entire table start-to-finish in one sweep. This is a Sequential Scan (or Table Scan).

The optimizer is smart enough to know when your index is useless.

Quiz Time

A database optimizer will always use an index on a column if one exists, regardless of the query's selectivity.

Joins: The Heavy Lifting

The optimizer's hardest job is figuring out how to join tables. If you join three tables (A, B, and C), there are many ways to do it:

  • Join A to B, then join the result to C.
  • Join B to C, then join the result to A.
  • Join A to C, then join to B.

As the number of tables increases, the number of possible permutations explodes exponentially.

Quiz Time

As the number of tables in a JOIN increases, how does the number of possible join orderings grow?

The optimizer evaluates the physical algorithms available for joins:

  1. Nested Loop Join: For every row in table A, scan table B. Great if table A is tiny. Terrible if both are huge.
  2. Hash Join: Take the smaller table, build an in-memory hash map of the join keys, then scan the larger table and probe the hash map. Very fast for large, unsorted datasets.
  3. Merge Join: If both tables are already sorted by the join key (perhaps due to an index), just zip them together simultaneously. Extremely efficient.

The optimizer looks at the table sizes and available indexes and picks the winning algorithm.

Quiz Time

Which join algorithm is most efficient when both tables being joined are already sorted by the join key?

Real-World Tools: EXPLAIN

As an engineer, you don't have to guess what the optimizer is doing. Every major database gives you a tool to peek inside its brain: the EXPLAIN command.

By prepending EXPLAIN (or EXPLAIN ANALYZE in Postgres) to your query, the database will output the exact execution plan it chose. It will tell you if it did a sequential scan, what join algorithms it used, and how long each step took.

Learning to read an EXPLAIN plan is a superpower for backend engineers. When a query is slow, the execution plan tells you exactly why (e.g., "Ah, it's doing a sequential scan on a 50-million row table because I forgot an index!").

Quiz Time

What does the database use to decide whether a Sequential Scan or an Index Scan is cheaper for a given query?

Quiz Time

The EXPLAIN command lets engineers see the exact execution plan the optimizer chose for a query.

Summary

  • SQL tells the database what you want; the Query Optimizer decides how to physically retrieve it.
  • The optimizer calculates the "Cost" of multiple execution plans and picks the cheapest one to minimize Disk I/O and CPU.
  • It relies on Statistics about your data distribution. It may intentionally ignore an index if scanning the whole table is deemed faster.
  • It determines the most efficient order and physical algorithms (Nested Loop, Hash, Merge) for executing Joins.
  • Engineers use the EXPLAIN command to diagnose slow queries and understand the optimizer's decisions.

Read Replicas

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices