CAP Theorem

In theoretical computer science the CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss)

According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.

(courtesy of Wikipedia)

In summary:

C – consistency (ACID)

A – availability (uptime)

P – partition tolerance

The layman’s description of partition tolerance is basically the ability to split your data across multiple, geographically distinct partitions (George Reese).

From the book The Architecture of Open Source Applications,

Partitioning: For scalability, availability, or durability reasons, does the data need to live on multiple servers? How do you know which record is on which server?

Splitting data across partitions reminds us of Google File System and BigTable. Partitions can be

  • node within a LAN (Level 1 Partition)
  • node across a LAN (Level 2 Partition)
  • node within a WAN (Level 3 Partition)
  • node across a WAN (Level 4 Partition)

How much you can tolerate latency or downtime defines your zone comfort for availability. My interpretation of CAP Theorem from a system point of view is that, partition tolerance and availability are just two sides of the same coin. The only difference is that, partition tolerance operates at the node level in a system, whereas availability operates at the lower level of request and response.

So, in notation,

  • C + A –>  -P
  • A + P –>  -C
  • P + C –>  -A

CAP Theorem basically says, “You may choose two but not all three”.

A + P = -C is what NoSQL is all about. However, availability and partition tolerance are subjective since it depends on what is your concept of availability and arbitrary message loss. As the level of partition tolerance moves higher, so too are costs (see Google MegaStore multi-datacenter datastore).

It is interesting to note that Xeround is somewhat immune to CAP Theorem but only at Level 2 Partition. From Xeround FAQ,

Do you offer high-availability and auto failover across multiple regions/availability-zones/clouds?

We plan to expand to offering availability across multiple clouds, regions and availability zones within existing data centers.Our technology allows us to arbitrarily set the number of data replicas that we manage as well as the role and location of each one of these. By design, our system is tolerant to faults in any of its replicas and continues to perform during and after a replica’s failure. In its current form, our service is offered with two active-active replicas, both residing in the same data center (and availability zone) as a means of protecting against a single server’s failure.

In the future we will offer other configurations like 4-replicas (1-per-zone) – where in such a configuration the system will be able to withstand the loss of one, two or even three of the availability zones that it is using and still provide the database service. Similarly, another deployment form could be split between the east and west data centers to ensure the DB’s availability even in the event of a data center meltdown.

Btw, here is Amazon’s description of multiple locations:

Multiple Locations – Amazon EC2 provides the ability to place instances in multiple locations. Amazon EC2 locations are composed of Regions and Availability Zones. Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location. Regions consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries. The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 Region. Amazon EC2 is currently available in seven regions: US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), and AWS GovCloud.

And yet, the recent Amazon EBS outage calls availability zones into question.


One Comment Add yours

  1. Itamar Haber says:

    Thanks for sharing your interpretation and insights on CAP. Being a distributed system, Xeround is bound by the theorem and in that context our database is C+A (=-P). Regrettably, we are not immune to network splits but we do try to offset our partition intolerance by using automatic, majority-based reconciliation mechanisms when these occur. This approach is suitable when handling several common partitioning scenarios, but in others scenarios we would need to resort to external arbitration to overcome the network split.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s