30 Must-Know System Design Concepts
Updated June 3, 2026This article is your system design cheat sheet. You'll find 30 concepts that show up in interviews, in production systems, and whenever you're building at scale. For each one, you get a definition, the "why" behind it, and a concrete example you can visualize.
Bookmark this. You'll be back.
Networking
These are the technologies that move data from client to server and back. Without them, there's no system.
1. DNS (Domain Name System)
DNS translates human-readable domain names (like google.com) into IP addresses that machines can route to. When you type a URL, your browser queries a chain of DNS servers to resolve it before a single byte of content is fetched. At scale, companies use services like Route 53 or Cloudflare with geolocation routing to send users to the nearest data center.
Step 1 — DNS lookup: client queries the DNS server to resolve the domain to an IP address
2. Load Balancer
A load balancer distributes incoming traffic across multiple backend servers. Without one, a single server becomes a bottleneck and a single point of failure. Load balancers can operate at Layer 4 (TCP/IP) or Layer 7 (HTTP), with the latter being able to route based on URL paths, headers, or cookies. Example: AWS ALB routes /api/* requests to one server group and /images/* to another.
Step 2 — Load Balancer: dynamic traffic is routed to the Load Balancer
3. CDN (Content Delivery Network)
A CDN is a globally distributed network of servers that caches and serves static content (images, JS, CSS, video) from locations physically close to the user. Instead of every request hitting your origin server in Virginia, a user in Tokyo gets served from a Tokyo edge node. Example: Netflix uses a custom CDN called Open Connect to deliver video from ISP-colocated servers.
Step 3 — CDN Edge: static assets are served from the nearest CDN edge node
4. HTTP/HTTPS and TLS
HTTP is the application-layer protocol that powers the web. HTTPS wraps it in TLS encryption so that data in transit can't be intercepted. Every production system uses HTTPS. TLS termination often happens at the load balancer layer so backend services can communicate unencrypted within a trusted private network.
5. WebSockets
WebSockets provide a persistent, bidirectional communication channel between client and server over a single TCP connection. Unlike HTTP's request-response model, either side can send a message at any time. Example: Slack, WhatsApp Web, and multiplayer games all use WebSockets for real-time messaging.
6. TCP vs UDP
TCP guarantees ordered, reliable delivery with acknowledgements and retransmission. UDP is connectionless and offers no delivery guarantee — but it's much faster. TCP is used for web traffic, databases, and anything that requires correctness. UDP is used for video streaming, DNS lookups, and gaming where a dropped packet is better than a delayed one.
Databases
Where your data lives. The choice between SQL and NoSQL, the scaling strategies, the consistency guarantees — these decisions ripple through every other part of your system.
SQL and NoSQL have different strengths; replication and sharding handle scale
7. SQL Databases (Relational)
SQL databases store data in tables with rows and columns, enforcing a schema and supporting complex joins and transactions. They follow ACID properties. Example: PostgreSQL powers most of the backend at Instagram, Shopify, and many fintech companies. Use SQL when you need strong consistency and complex queries.
8. NoSQL Databases
NoSQL is an umbrella term for databases that don't use a relational model. Subtypes include document stores (MongoDB), key-value stores (DynamoDB), wide-column stores (Cassandra), and graph databases (Neo4j). They typically sacrifice some consistency for horizontal scalability and flexible schemas. Example: Cassandra handles writes for Discord's message history — billions of messages with high write throughput across multiple data centers.
9. Indexing
An index is a data structure (usually a B-tree or LSM tree) that lets a database find rows matching a query without scanning the entire table. Indexes make reads dramatically faster but slow down writes because the index must also be updated. Example: An index on user_id in an orders table makes SELECT * FROM orders WHERE user_id = 42 instantaneous instead of scanning millions of rows.
10. Database Replication
Replication copies data from one database node (the primary/leader) to one or more replica nodes. Replicas serve reads, reducing load on the primary. If the primary fails, a replica can be promoted. Replication lag — the delay between a write on the primary and its appearance on replicas — is the key trade-off. Example: Most web apps configure PostgreSQL with one primary and two read replicas behind a query router.
11. Database Sharding
Sharding horizontally partitions data across multiple database nodes, each called a shard. Each shard holds a subset of the data (e.g., users A-M on shard 1, users N-Z on shard 2). Sharding enables massive horizontal scale but complicates cross-shard queries and transactions. Example: Uber shards their trip data by city to keep each shard's dataset manageable.
12. ACID vs BASE
ACID (Atomicity, Consistency, Isolation, Durability) is the set of guarantees that relational databases provide for transactions — your bank transfer either fully completes or fully rolls back. BASE (Basically Available, Soft state, Eventually consistent) is the model that most NoSQL systems use — they prioritize availability and accept that data may be temporarily inconsistent across nodes. Your shopping cart being slightly out of sync for a second is BASE. Your wire transfer completing only halfway is a failure of ACID.
Caching
Fast, in-memory stores that sit between your application and slower backends. They're one of the most effective ways to reduce latency and load on your database.
A cache layer between app and database reduces load by orders of magnitude
13. Cache
A cache is a fast, in-memory data store that sits in front of slower storage (a database or an API) and serves frequently requested data without hitting the source every time. Caches reduce latency by orders of magnitude. Example: Redis caches Twitter's timeline — serving pre-assembled feeds from memory instead of querying a database on every load.
14. Cache Eviction Policies
When a cache is full, it must decide what to remove to make space. Common policies: LRU (Least Recently Used) evicts the item that hasn't been accessed the longest — good for temporal locality. LFU (Least Frequently Used) evicts the item accessed the fewest times — better for stable hot items. TTL (Time To Live) evicts items after a fixed expiry window.
A cache is full and must evict an item. Which policy is best suited for keeping items that are accessed rarely but consistently popular over a long period?
15. Cache Invalidation
Cache invalidation is famously hard. There are three main strategies: TTL-based (let the cache entry expire), write-through (update the cache and database together on every write), and write-behind/write-back (update the cache first, propagate to the database asynchronously). Each trades consistency for performance differently.
16. Content Delivery Network (CDN Caching)
CDNs are caches at the network edge. When a user requests a static asset, the CDN checks its local cache first. If it's a miss, it fetches from the origin and caches it for future requests. Cache-Control headers control how long CDN nodes keep assets before re-validating. Example: A Cache-Control: max-age=31536000 header tells the CDN to serve a JS bundle for a full year without checking the origin.
Messaging & Queues
The backbone of asynchronous systems. When you don't want to wait for something to complete, you push it to a queue and keep moving.
Producers drop messages into a queue; consumers process them independently
17. Message Queue
A message queue decouples the service that produces a task (the producer) from the service that executes it (the consumer). The producer drops a message and moves on; the consumer picks it up when it has capacity. This enables asynchronous processing, buffering against traffic spikes, and retry on failure. Example: Amazon SQS queues order-processing jobs so the checkout flow never blocks waiting for inventory updates.
18. Pub/Sub (Publish-Subscribe)
In pub/sub, producers publish messages to a topic and any number of subscribers receive a copy. Unlike a queue where a message is consumed once, pub/sub broadcasts to all subscribers. Example: Google Cloud Pub/Sub lets a payment service notify the order service, the notification service, and the analytics service all at once with a single event.
In a pub/sub system, a message published to a topic is consumed by only one subscriber.
19. Kafka
Apache Kafka is a distributed, durable, high-throughput event streaming platform. It stores messages in ordered, immutable logs called partitions. Consumers track their own offset in the log, enabling replay. Kafka handles millions of events per second and retains them for days or weeks. Example: LinkedIn (Kafka's birthplace) uses it to stream activity events, metrics, and logs across their entire infrastructure.
What makes Kafka fundamentally different from a standard message queue like SQS?
20. Event-Driven Architecture
In an event-driven system, services communicate by emitting and reacting to events rather than calling each other directly. This decouples services, improves scalability, and makes it easy to add new consumers without changing the producer. The trade-off is eventual consistency and increased operational complexity.
Storage
Different types of storage for different jobs. Object storage for files, block storage for VM disks, and everything scales to absurd levels.
21. Object Storage
Object storage (like AWS S3, Google Cloud Storage) stores files as discrete objects with metadata and a unique key. There's no hierarchy — just a flat namespace of keys. It's massively scalable, cheap, and durable (S3 is designed for 99.999999999% durability). Example: Dropbox stores every file you upload as an object in S3, deduplicated by content hash.
Object storage holds files as discrete objects accessed via HTTP APIs
22. Block Storage
Block storage divides data into fixed-size blocks and addresses them like a hard drive. It's used as the underlying storage for virtual machine disks. Example: AWS EBS (Elastic Block Store) provides block storage for EC2 instances — you format it with a file system and treat it exactly like a local disk.
Block storage divides data into blocks acting as raw virtual volumes
Scalability Patterns
The techniques for handling more traffic, more data, more users. Every scaling strategy is a tradeoff between simplicity and cost.
A single server handles all traffic
23. Horizontal vs Vertical Scaling
Vertical scaling (scale up) means adding more CPU, RAM, or disk to a single machine. It's simple but has a hard ceiling. Horizontal scaling (scale out) means adding more machines and distributing load across them. It's more complex but theoretically unlimited. Most modern cloud architectures are designed to scale horizontally — stateless services behind a load balancer can add nodes on demand.
Vertical scaling (scale up) vs horizontal scaling (scale out)
24. Consistent Hashing
Consistent hashing is a technique for distributing data across a ring of nodes such that when a node is added or removed, only the keys adjacent to that node need to be remapped — not every key in the system. It's used in distributed caches and databases. Example: Cassandra and DynamoDB use consistent hashing to distribute data across cluster nodes with minimal reshuffling.
What problem does consistent hashing specifically solve in distributed caches and databases?
25. Rate Limiting
Rate limiting restricts how many requests a client can make in a given time window to prevent abuse and protect service capacity. Common algorithms: Token Bucket (clients earn tokens at a fixed rate; each request costs a token), Sliding Window (count requests in the last N seconds with sub-second accuracy). Example: GitHub's API allows 5,000 authenticated requests per hour using a sliding window counter.
26. Circuit Breaker
A circuit breaker wraps calls to a downstream service and monitors for failures. If failures exceed a threshold, the circuit "opens" and all further calls immediately fail without hitting the failing service — giving it time to recover. After a timeout, the circuit enters a half-open state and probes whether the service has recovered. Example: Netflix's Hystrix library popularized this pattern for their microservices architecture.
Distributed Systems
When your system spans multiple machines, you face new problems that don't exist in a single server. Consensus, consistency, failure handling — these concepts separate real systems from tutorials.
Multiple nodes must coordinate; leaders handle consensus, replicas provide fault tolerance
27. CAP Theorem
The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read returns the most recent write), Availability (every request gets a non-error response), and Partition Tolerance (the system continues operating despite network splits). Since partitions are inevitable, the real choice is between CP and AP. Banks choose CP. Social feeds typically choose AP.
The CAP theorem states that a distributed system can simultaneously guarantee consistency, availability, and partition tolerance.
28. Replication and Consistency Models
Between strict consistency (every read sees the latest write) and eventual consistency (reads may see stale data but will converge), there's a spectrum. Read-your-writes consistency guarantees you see your own writes. Monotonic reads guarantee you don't see older versions over time. Picking the right consistency model is a core design decision. Example: DynamoDB lets you choose between eventually consistent reads (cheaper, faster) and strongly consistent reads per request.
29. Leader Election
In distributed systems, some operations (like writing to a replicated log) must be coordinated through a single leader to avoid conflicts. Leader election algorithms like Raft determine which node is the leader, ensure only one leader exists at a time, and handle leader failures by automatically electing a successor. Example: etcd (which powers Kubernetes) uses Raft to elect a leader for its distributed key-value store.
30. Idempotency
An operation is idempotent if performing it multiple times has the same effect as performing it once. In distributed systems, retries are inevitable (network failures, timeouts). If your operations aren't idempotent, retries cause double-charges, duplicate orders, or duplicate messages. Example: Stripe assigns a unique Idempotency-Key header to payment requests so that retrying a failed request never charges the customer twice.
Stripe assigns a unique Idempotency-Key header to payment requests so that retrying a failed request never charges the customer twice.
What to Take Away
These 30 concepts are the building blocks of every production system you'll see. You don't need to memorize implementation details, but you should understand what each does, why you'd use it, and what you're trading off.
The power of system design comes from combinations. A URL shortener is consistent hashing + caching + a SQL database. A social feed is pub/sub + eventual consistency + CDN caching. A payment system is ACID transactions + idempotency + load balancing.
The patterns repeat. Learn them once, and you can design thousands of systems.
Saved on this device only
Sign in to sync progress across devices