CAP Theorem

Updated June 6, 2026
M
Magic Magnets Team
10 min read

The CAP theorem is one of the most cited and most misunderstood ideas in distributed systems. It comes up in nearly every system design interview, and it is worth understanding deeply rather than just memorizing the acronym.

Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, CAP states that a distributed system can guarantee at most two of three properties:

  • Consistency
  • Availability
  • Partition Tolerance

Let's define each one precisely, because the informal definitions lead to confusion.

algobase.dev
Normal operation: both regions serve requests. US writes flow to the primary DB; cross-region replication keeps the EU replica in sync (~70-150ms lag). All three CAP properties appear to hold, but this is the easy case. The real question is what happens during a failure.
1 / 1

Normal operation — both regions healthy

The Three Properties

Consistency (in CAP's definition)

Every read receives the most recent write, or an error.

This is not the same as ACID consistency in databases. CAP consistency is about all nodes seeing the same data at the same time. If you write a value to node A, any subsequent read from node B must return that updated value.

Think of it as a system-wide guarantee: there's only one version of the truth, and everyone sees it immediately.

Quiz Time

In the CAP theorem, what does Consistency specifically mean?

Availability

Every request receives a response, rather than an error, though it might not be the most recent data.

A highly available system always responds. It might return stale data, but it doesn't say "I can't answer right now." No node is allowed to refuse requests.

Partition Tolerance

The system continues operating even when network messages between nodes are lost or delayed.

A network partition is when one or more nodes cannot communicate with the rest. This is not a hypothetical edge case; it is a reality of distributed systems. Networks fail, packets are dropped, and cables get cut.

Why You Always Need P

Here's the crux: partition tolerance is not optional in a distributed system.

If your system runs across multiple nodes (which is the definition of a distributed system), network partitions will happen. Not "might happen," but will happen. Hardware fails, switches get misconfigured, and datacenters have outages.

A system that cannot tolerate partitions can only run safely on a single node, which defeats the purpose of a distributed system.

So the real choice is: when a partition occurs, what do you sacrifice? Consistency or Availability?

Quiz Time

Partition tolerance is optional for distributed systems.

algobase.dev
Network partition: the cross-region replication link is broken. US writes continue, but EU is now isolated. The EU replica falls behind. Both regions are still "up," but they are serving different versions of the data. You must now choose: do you keep EU serving (AP) or shut it down (CP)?
1 / 1

Network partition occurs

CP Systems: Consistency over Availability

When a partition is detected, a CP system refuses to answer rather than risk returning inconsistent data. You get a correct answer or an error, never stale data presented as current.

algobase.dev
CP choice (consistency over availability): EU refuses all writes and returns 503 errors until the partition heals. Users in Europe cannot proceed, but there is no risk of serving or accepting stale data. ZooKeeper, etcd, and HBase take this approach. The AP alternative: EU keeps serving reads from its stale replica where users get a response, but it might be wrong.
1 / 1

CP choice during partition

Real examples:

  • HBase: refuses writes during certain failure scenarios
  • Zookeeper: a minority partition stops accepting writes; only the majority quorum can proceed
  • etcd: same approach, where the minority side of a partition refuses requests

When to choose CP: financial systems, inventory management, anything where serving incorrect data is worse than being temporarily unavailable. You would rather say "try again in a moment" than show a user a bank balance that is $500 lower than it actually is.

Quiz Time

During a network partition, a CP system will:

AP Systems: Availability over Consistency

When a partition occurs, an AP system keeps responding to all requests, but nodes on different sides of the partition might return different (inconsistent) data. After the partition heals, the system eventually reconciles and converges to a consistent state. This is called eventual consistency.

Real examples:

  • Cassandra: always responds; resolves conflicts on read using timestamps
  • CouchDB: syncs between nodes asynchronously; conflict resolution is application-defined
  • DynamoDB: by default eventually consistent; can be configured for strong consistency on reads at higher cost

When to choose AP: social media feeds, product catalogs, shopping cart contents, search indexes, DNS; anywhere where serving slightly stale data is acceptable and uptime is paramount. Your Twitter feed being 2 seconds behind is fine. Your Twitter being down is not.

Quiz Time

Which of the following is an example of an AP system?

The False Trichotomy

One important nuance: CAP describes a binary choice during a network partition. In practice, partitions are rare. Most of the time, systems aren't partitioned, and you can deliver both consistency and availability.

The choice of CP vs AP is really: "What do we do when the bad thing happens?" Most of the time, that scenario never occurs.

This is why the CAP theorem is sometimes criticized as oversimplified: it treats this edge-case scenario as the defining characteristic of a system, when most systems operate in the normal (non-partitioned) case the vast majority of the time.

PACELC: A More Complete Model

Because CAP only addresses partition scenarios, Daniel Abadi proposed the PACELC model as an extension:

If there's a Partition, choose between Availability and Consistency. Else (normal operation), choose between Latency and Consistency.

PACELC adds the normal-case trade-off: even without a partition, there is a trade-off between latency and consistency.

To achieve strong consistency without a partition, a write must be acknowledged by multiple nodes before returning success. That coordination takes time, which adds latency. Systems that skip this coordination (writing to one node and replicating asynchronously) have lower latency but weaker consistency guarantees.

SystemPartitionNormal Operation
DynamoDB (default)PAEL (eventual + low latency)
CassandraPAEL
SpannerPCEC (strong consistency, higher latency)
HBasePCEC
MySQLPCEC

PACELC explains why even "CA" systems (single-node databases like MySQL) make a latency/consistency trade-off in their replication settings.

Quiz Time

PACELC extends CAP by adding a trade-off that applies even when there is no partition.

Practical Implications for System Design

In a system design interview — and in real engineering decisions — CAP shapes database selection:

Choose a CP-leaning database when:

  • Data correctness is critical (payments, inventory, reservations)
  • Users cannot act on stale data safely
  • You can tolerate brief unavailability during failures

Choose an AP-leaning database when:

  • High availability is the top priority
  • Eventual consistency is acceptable (social feeds, analytics, search)
  • You need to handle writes from multiple geographic regions simultaneously

The honest answer: most real systems use different consistency models for different data. Your user profile might be eventually consistent (fine if it takes 2 seconds to propagate). Your payment record must be strongly consistent (you cannot charge someone twice).

Pick your consistency model per use case, not per system. Multi-model architectures are the norm at scale.

Summary

CAP theorem states that distributed systems can guarantee only two of three properties: Consistency, Availability, and Partition Tolerance. Since network partitions are unavoidable, the real choice is between C and A when partitions occur. CP systems (ZooKeeper, HBase) sacrifice availability to maintain consistency. AP systems (Cassandra, DynamoDB) stay available but may return stale data. PACELC extends CAP to cover the normal-operation trade-off between latency and consistency, because even without partitions, strong consistency requires coordination that costs latency. In practice, different parts of your system need different consistency guarantees.

Consistency Models

How helpful was this content?

Comments

0/2000

Sign in to join the discussion

Saved on this device only

Sign in to sync progress across devices