CAP Theorem
Updated June 6, 2026The CAP theorem is one of the most cited and most misunderstood ideas in distributed systems. It comes up in nearly every system design interview, and it is worth understanding deeply rather than just memorizing the acronym.
Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, CAP states that a distributed system can guarantee at most two of three properties:
- Consistency
- Availability
- Partition Tolerance
Let's define each one precisely, because the informal definitions lead to confusion.
Normal operation — both regions healthy
The Three Properties
Consistency (in CAP's definition)
Every read receives the most recent write, or an error.
This is not the same as ACID consistency in databases. CAP consistency is about all nodes seeing the same data at the same time. If you write a value to node A, any subsequent read from node B must return that updated value.
Think of it as a system-wide guarantee: there's only one version of the truth, and everyone sees it immediately.
In the CAP theorem, what does Consistency specifically mean?
Availability
Every request receives a response, rather than an error, though it might not be the most recent data.
A highly available system always responds. It might return stale data, but it doesn't say "I can't answer right now." No node is allowed to refuse requests.
Partition Tolerance
The system continues operating even when network messages between nodes are lost or delayed.
A network partition is when one or more nodes cannot communicate with the rest. This is not a hypothetical edge case; it is a reality of distributed systems. Networks fail, packets are dropped, and cables get cut.
Why You Always Need P
Here's the crux: partition tolerance is not optional in a distributed system.
If your system runs across multiple nodes (which is the definition of a distributed system), network partitions will happen. Not "might happen," but will happen. Hardware fails, switches get misconfigured, and datacenters have outages.
A system that cannot tolerate partitions can only run safely on a single node, which defeats the purpose of a distributed system.
So the real choice is: when a partition occurs, what do you sacrifice? Consistency or Availability?
Partition tolerance is optional for distributed systems.
Network partition occurs
CP Systems: Consistency over Availability
When a partition is detected, a CP system refuses to answer rather than risk returning inconsistent data. You get a correct answer or an error, never stale data presented as current.
CP choice during partition
Real examples:
- HBase: refuses writes during certain failure scenarios
- Zookeeper: a minority partition stops accepting writes; only the majority quorum can proceed
- etcd: same approach, where the minority side of a partition refuses requests
When to choose CP: financial systems, inventory management, anything where serving incorrect data is worse than being temporarily unavailable. You would rather say "try again in a moment" than show a user a bank balance that is $500 lower than it actually is.
During a network partition, a CP system will:
AP Systems: Availability over Consistency
When a partition occurs, an AP system keeps responding to all requests, but nodes on different sides of the partition might return different (inconsistent) data. After the partition heals, the system eventually reconciles and converges to a consistent state. This is called eventual consistency.
Real examples:
- Cassandra: always responds; resolves conflicts on read using timestamps
- CouchDB: syncs between nodes asynchronously; conflict resolution is application-defined
- DynamoDB: by default eventually consistent; can be configured for strong consistency on reads at higher cost
When to choose AP: social media feeds, product catalogs, shopping cart contents, search indexes, DNS; anywhere where serving slightly stale data is acceptable and uptime is paramount. Your Twitter feed being 2 seconds behind is fine. Your Twitter being down is not.
Which of the following is an example of an AP system?
The False Trichotomy
One important nuance: CAP describes a binary choice during a network partition. In practice, partitions are rare. Most of the time, systems aren't partitioned, and you can deliver both consistency and availability.
The choice of CP vs AP is really: "What do we do when the bad thing happens?" Most of the time, that scenario never occurs.
This is why the CAP theorem is sometimes criticized as oversimplified: it treats this edge-case scenario as the defining characteristic of a system, when most systems operate in the normal (non-partitioned) case the vast majority of the time.
PACELC: A More Complete Model
Because CAP only addresses partition scenarios, Daniel Abadi proposed the PACELC model as an extension:
If there's a Partition, choose between Availability and Consistency. Else (normal operation), choose between Latency and Consistency.
PACELC adds the normal-case trade-off: even without a partition, there is a trade-off between latency and consistency.
To achieve strong consistency without a partition, a write must be acknowledged by multiple nodes before returning success. That coordination takes time, which adds latency. Systems that skip this coordination (writing to one node and replicating asynchronously) have lower latency but weaker consistency guarantees.
| System | Partition | Normal Operation |
|---|---|---|
| DynamoDB (default) | PA | EL (eventual + low latency) |
| Cassandra | PA | EL |
| Spanner | PC | EC (strong consistency, higher latency) |
| HBase | PC | EC |
| MySQL | PC | EC |
PACELC explains why even "CA" systems (single-node databases like MySQL) make a latency/consistency trade-off in their replication settings.
PACELC extends CAP by adding a trade-off that applies even when there is no partition.
Practical Implications for System Design
In a system design interview — and in real engineering decisions — CAP shapes database selection:
Choose a CP-leaning database when:
- Data correctness is critical (payments, inventory, reservations)
- Users cannot act on stale data safely
- You can tolerate brief unavailability during failures
Choose an AP-leaning database when:
- High availability is the top priority
- Eventual consistency is acceptable (social feeds, analytics, search)
- You need to handle writes from multiple geographic regions simultaneously
The honest answer: most real systems use different consistency models for different data. Your user profile might be eventually consistent (fine if it takes 2 seconds to propagate). Your payment record must be strongly consistent (you cannot charge someone twice).
Pick your consistency model per use case, not per system. Multi-model architectures are the norm at scale.
Summary
CAP theorem states that distributed systems can guarantee only two of three properties: Consistency, Availability, and Partition Tolerance. Since network partitions are unavoidable, the real choice is between C and A when partitions occur. CP systems (ZooKeeper, HBase) sacrifice availability to maintain consistency. AP systems (Cassandra, DynamoDB) stay available but may return stale data. PACELC extends CAP to cover the normal-operation trade-off between latency and consistency, because even without partitions, strong consistency requires coordination that costs latency. In practice, different parts of your system need different consistency guarantees.
How helpful was this content?
Comments
Sign in to join the discussion
Saved on this device only
Sign in to sync progress across devices