Relational Databases
Updated June 3, 2026Relational databases have been the backbone of software engineering for over 50 years. Every time you log into a website, place an order, or send a message, there's a very good chance a relational database is involved somewhere in that chain. PostgreSQL, MySQL, SQLite — you'll use these throughout your career, so understanding them deeply is worth the investment.
Here's the thing though: most engineers treat relational databases as a black box they throw SQL at. Understanding why they work the way they do (tables, normalization, indexes, joins) makes you dramatically better at using them, designing schemas, and debugging performance issues.
The Relational Model
A relational database organizes data into tables (also called relations). Each table has a fixed set of columns (attributes), and each row represents one record.
Here's a simple users table:
| id | name | created_at | |
|---|---|---|---|
| 1 | alice@example.com | Alice | 2024-01-01 |
| 2 | bob@example.com | Bob | 2024-01-15 |
And an orders table:
| id | user_id | total | status |
|---|---|---|---|
| 101 | 1 | 49.99 | shipped |
| 102 | 1 | 12.50 | pending |
| 103 | 2 | 99.00 | delivered |
The user_id column in orders is a foreign key — it references the id column in users. This is how relationships are modeled. Alice (user 1) has two orders. Bob (user 2) has one.
The power of this model is that you can ask questions that span multiple tables: "Give me all orders for Alice, along with her email address." That's a join — the relational database's killer feature.
What is the purpose of a foreign key in a relational database?
Normalization: Eliminating Redundancy
Normalization is the process of structuring your tables to reduce data redundancy and prevent anomalies. There are several "normal forms" (1NF, 2NF, 3NF, BCNF) but in practice, aiming for 3NF covers almost everything you need.
First Normal Form (1NF)
Each column should contain atomic (indivisible) values. No repeating groups or arrays in a cell.
Bad (violates 1NF):
| order_id | items |
|---|---|
| 101 | "shirt, pants, socks" |
Good:
| order_id | item |
|---|---|
| 101 | shirt |
| 101 | pants |
| 101 | socks |
Second Normal Form (2NF)
Every non-key column must depend on the entire primary key, not just part of it. This only applies when you have composite primary keys.
If your primary key is (order_id, product_id), then product_name should be in a products table (it depends only on product_id), not in the order-product junction table.
Third Normal Form (3NF)
Non-key columns must depend only on the primary key, not on other non-key columns.
Bad (violates 3NF):
| order_id | zip_code | city |
|---|---|---|
| 101 | 10001 | New York |
Here, city depends on zip_code, not on order_id. If a zip code's city changes, you'd have to update every order row. Fix it by creating a zip_codes table.
Normalization isn't about following rules for their own sake. It's about making sure each fact is stored in exactly one place. That's what prevents update anomalies and keeps your data consistent.
A table stores zip_code and city in the same row as order_id. This violates Third Normal Form (3NF).
Joins: The Superpower of Relational Databases
Joins let you query related data across multiple tables in a single query. There are four main types:
INNER JOIN: Returns rows where there's a match in both tables. The most common join.
SELECT users.name, orders.total
FROM users
INNER JOIN orders ON users.id = orders.user_id;LEFT JOIN: Returns all rows from the left table, plus matching rows from the right table. Non-matching right rows get NULL.
-- Get all users, even those with no orders
SELECT users.name, orders.total
FROM users
LEFT JOIN orders ON users.id = orders.user_id;RIGHT JOIN — The mirror of LEFT JOIN. Less commonly used.
FULL OUTER JOIN — Returns all rows from both tables, matching where possible.
Joins are powerful, but they have a cost. Joining large tables without proper indexes can bring a production database to its knees.
Which JOIN type returns all rows from the left table, filling NULL for non-matching rows on the right?
Indexes: How Databases Stay Fast
An index is a separate data structure that the database maintains to speed up queries. Without an index, a query like SELECT * FROM users WHERE email = 'alice@example.com' has to scan every single row in the table — a full table scan. With an index on the email column, the database can find Alice's row in O(log n) time using a B-tree.
How Indexes Work
PostgreSQL's default index type is a B-tree — a balanced tree structure where every leaf node is equidistant from the root. Lookups, range scans, and ordering are all efficient.
Other index types:
- Hash index: O(1) exact lookups, but no range queries
- GiST / GIN: For full-text search and geometric data
- Partial index: Index only a subset of rows (e.g., only active users)
The Trade-Off
Every index speeds up reads but slows down writes. When you insert or update a row, every index on that table must be updated too. A table with 10 indexes takes roughly 10x longer to insert into than a table with none.
Rule of thumb: add indexes on columns you frequently filter, join, or sort on. Don't add indexes preemptively on every column.
-- Create an index on email for fast user lookups
CREATE INDEX idx_users_email ON users(email);
-- Composite index for queries that filter on both columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);EXPLAIN ANALYZE
Always check your query plan before and after adding indexes:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 1 AND status = 'pending';If you see "Seq Scan" on a large table, you probably need an index.
PostgreSQL's default index type is a B-tree, which supports both exact lookups and range queries.
What is the main trade-off of adding many indexes to a table?
Real Databases: PostgreSQL, MySQL, SQLite
PostgreSQL is the gold standard for general-purpose SQL. It supports JSON, full-text search, geospatial queries (PostGIS), custom types, and a rich extension ecosystem. It has excellent ACID compliance, MVCC (multi-version concurrency control) for high-concurrency reads, and a very active community. Use it for most new projects.
MySQL (and MariaDB) is ubiquitous in the web world — WordPress, Drupal, and a huge chunk of legacy web apps run on it. The InnoDB storage engine provides full ACID compliance. It's slightly faster for certain simple read workloads, but PostgreSQL has caught up and surpassed it for most use cases. If you're starting fresh today, choose Postgres.
SQLite is a single-file, serverless SQL database embedded directly in the application. It powers every iOS and Android app (it's baked into both operating systems), every Chrome browser, and countless desktop apps. It's not for high-concurrency server workloads, but for local storage, prototyping, and edge deployments it's unbeatable. Turso and Cloudflare D1 are building interesting distributed SQLite products worth watching.
SQLite is a good choice for high-concurrency server workloads handling thousands of simultaneous writes.
When Relational Databases Are the Best Choice
Almost always, to be direct about it. Specifically:
- Structured data with known relationships: users, orders, products, invoices, anything with clear entities and relationships between them
- Complex queries: multi-table joins, aggregations, window functions, subqueries
- Data integrity requirements: foreign keys, check constraints, and unique constraints ensure your data stays valid
- Transactional workloads: ACID guarantees are a must for financial systems, inventory management, booking systems
- Reporting and analytics: SQL is incredibly expressive for ad-hoc analysis
The case against relational databases is narrow: truly schema-less data, extreme write throughput (millions/second), or data that's naturally graph-shaped. For everything else, defaulting to PostgreSQL and expanding from there is the pragmatic move.
Summary
Relational databases organize data into tables with explicit relationships via foreign keys. Normalization (1NF → 2NF → 3NF) eliminates redundancy by ensuring each fact lives in exactly one place. Joins let you query across tables efficiently. Indexes — B-trees by default — make reads fast at the cost of slower writes, so index columns you actually query on. PostgreSQL is the modern default for new projects; MySQL remains dominant in legacy web systems; SQLite excels at embedded and local storage. Relational databases are the right choice for the vast majority of application data — structured, transactional, and relational.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices