CAP, Google Spanner, & Survival: Eventual Consistency | ArangoDB
In Next gen NoSQL: The demise of eventual consistency a recent post on Gigaom FoundationDB founder Dave Rosenthal proclaims the demise of eventual consistency. He argues that Google Spanner “demonstrates the falsity of a trade-off between strong consistency and high availability”. In this article I show that Google Spanner does not disprove CAP, but rather chooses one of many possible compromises between total consistency and total availability. For organizations with a less potent infrastructure than Google other compromises might be more suitable, and therefore eventual consistency is still a very good idea, even for future generations of nosql databases.
Dave’s article brings up a good and interesting discussion. However I believe that in the end he misses the point.
At the end he talks about machines failing and clients no longer being able to talk with those machines. Well, that is obvious. But a partition is a failure that is very different from a machine failing. Nobody has problems with consistency when a machine fails.
A partition is a network connection failure such that on either side of the failed network connection are both clients and database servers.
I will try to show that there still is a real and important consideration in this situation (which is at the core of the CAP theorem).
And I will try to demonstrate this with an example. So, assume Nozama is a large web shop. It has a database with all the products, containing – among other information – for each product the number of items in stock. When a customer orders a product the system checks whether there are still items left in stock, and if so reserves one item for this customer and reduces the number of items in stock accordingly. Reliability is very important for Nozama, and so they have their servers in two computing centres, one in Europe and one in the USA. Customers in Europe are connected to the computing centre in Europe and customers in the USA are connected to the computing centre in the USA.
In this context availability means that all customers (in Europe and the USA) can order any product at any time (as long as it is in stock). I.e. the only reason why the website would tell the customer that she cannot order a particular product is because no items are in stock for this product.
And consistency means that all customers (all over the world) have at all times the same information about the number of available items in stock. I.e. it will never happen that a customer orders a product (with positive feedback from the website) and is informed later that the product is no longer deliverable.
During normal operation everything is well. The web shop is available and consistent. There is no trade-off between availability and consistency.
But if the connection between the two computing centres in Europe and the USA is broken, then it becomes impossible to maintain both properties of the system (the web shop). Assume that we still allow all customers to order all products. If a customer in the USA orders the last item of a particular product (say the CD “Bach, The Well Tempered Clavier, Sviatoslav Richter”), the information that the number of items in stock is now 0 cannot be replicated to Europe. So if somebody in Europe wants to order this product, then the system will let her (because the database still contains the information that there is one item left in stock). But later she will be informed that the product is alas no longer deliverable. That means that consistency is violated. If – one the other hand – Nozama does no longer allow all customers to order all products when the connection is broken, then obviously the system is no longer fully available. Note that this does not mean that no customer cannot order anything anymore. It only means that some customers cannot order some things temporarily. Possible restrictions could be that all customers in the USA can still order everything, and no customers in Europe can order anything (maybe make that dependent on the time of day). Or one could define for each product individually on which continent this product is orderable during the breakdown. But in any case there is some restriction on the availability. This is basically what the CAP theorem says. You cannot keep both properties – availability and consistency – during the breakdown of the connection (the so called partition). So there is a decision to be made (actually as Brewer himself points out, this is not a binary decision, there is a whole spectrum of possible approaches – but you cannot have everything, something has to give).
What is important is that both possibilities could be the “right” approach depending on the business. In our example it basically depends on the costs of failing to deliver due to overselling in case of a connection breakdown. If you are selling books, and in case you cannot deliver a book you send out an apology and a small gift coupon, than allowing the system to become inconsistent is probably the right approach. If you are selling airplane tickets and customers will sue you if you do not deliver, than restricting availability is probably the right approach.
If you read the Google Spanner article carefully, you will notice that they cannot invalidate the CAP theorem. They have decided to opt for consistency. Which means that there can be situations where the Spanner database will not be fully available.
What Google can do is to make the probability of a network connection breakdown very (very) small. Basically they can reduce this probability more than any other company. Google owns so much networking infrastructure (and peering agreements, …), that it is possible for them to make the probability of totally losing the connection between two computing centres become very small. And for them it is not too expensive to do so. In particular it is cheaper for them to make this probability really small than to carry the development cost of building every application in such a way that this application can deal with inconsistency. However this is probably unique to Google. While many other companies have the reliability requirement to host their web site in two (or more) computing centres, very few can – with acceptable costs – make the connection between those two computing centres highly available. Note that the most likely reason for failure of such a connection is no longer the proverbial caterpillar, but misconfiguration of a router, switch, firewall, etc.
So those companies still have to face the question what – for them – is the best approach. Reduce the availability in case of a partition (as mentioned above that does not mean that no client can use the database any more at all, only that some clients cannot use some parts of the database). Or to allow the system to become somewhat inconsistent – and to fix up everything afterwards when the network breakdown is over.
And here (and only here) does eventual consistency enter the picture. Eventual consistency does not mean that each and every write (even when the system is not partitioned) to the database only becomes available to other clients after some time (eventually). Instead it means that if the database becomes inconsistent – during a partition – it will get rid of the inconsistencies when the partition is over. In a sense the database will heal itself (you only have to write down the rules how to merge divergent states).
This is actually a great property of the database! Without the interaction of administrators or specialists, without any human interaction the system will solve all the inconsistencies and return to a consistent state. If you have ever been responsible for a distributed database without this property, you will be painfully aware how contagious inconsistencies are, and instead of going away they tend to spread out.
So, if a company decides to opt for a distributed system that is fully available all the time, eventual consistency means that the inconsistencies that enter the system during network breakdowns do not spread out to catastrophic proportions but will disappear when the network breakdown is over.
A final remark
The real contribution of Spanner to the theory and practice of databases is not, that they show that one need not decide between availability and consistency. You still have to make this decision. And Google still has to make that decision also, they just can make the probability of a partition much smaller than everybody else. The real contribution is that they show how one can achieve consistency (in a distributed but not partitioned database) efficiently by maintaining an accurate and coherent clock on all machines. For a long time maintaining an accurate and coherent clock on distributed machines was so difficult that we stopped thinking about this possibility at all. But now with NTP, GPS and “your own atomic clocks” it becomes possible and even “cost effective” (though not “cheap”).
Do you have an opinion on this? Let us know in the comments!
6 Comments
Leave a Comment
Get the latest tutorials, blog posts and news:
Hey Martin, thanks for the reply.
“He argues that Google Spanner “demonstrates the falsity of a trade-off between strong consistency and high availability”. In this article I show that Google Spanner does not disprove CAP…”
I’m not sure I wrote the article clearly enough, or maybe that you read it quite the way I intended. I think we are all on the same page–Neither Spanner nor FoundationDB disprove CAP! Both choose consistency.
In the article I am trying to discriminate between “Availability” in the CAP sense and “high availability” in the fault-tolerance sense. They are quite different and seem to confuse a lot of people new to this subject. Many people think that a consistent, transactional, distributed database cannot be fault-tolerant. That is false and that falsity *is* demonstrated by Spanner.
Thanks for the article. Great points about Google’s ability to among the most likely companies to opt for the CP side of equation because of their technical and financial resources.
Best,
Dave Rosenthal
Hi Dave,
thank you for your comment.
Yes I am afraid I did not read the article quite the way you intended.
I did not understand that you want to make a distinction between “Availibility” in the CAP sense and “High Availibilty” in the fault-tolerance sense. And probably the main reason that I did not understand this is the fact that I don’t know how one would define “High Availibity” in the fault-tolerance sense.
Maybe you can help me and give me your definition. What are the faults that may happen? What do the clients (resp. some of the clients) experience when those faults happen?
Best,
martin
Hi Martin,
We actually have a white paper on our website that goes into what we believe to be the misunderstanding and the difference between CAP Availability and overall database system availability In the more commonly understood sense (the system can still serve read / write requests). Here’s the link: foundationdb.com/white-papers/the-cap-theorem
You can also see how FoundationDB stays available (in the non-CAP sense) during a live partition in our fault tolerance demo video: foundationdb.com/blog/foundationdb-fault-tolerance-demo-video
Hopefully this sheds better light on our position on CAP. If you were to update your blog so our position is accurately reflected, that would be super cool 🙂
Side note – I may be in Cologne for NoSQL Matters in April – it would be good to get together if you are around.
Best,
Nick Lavezzo
Hi Nick,
Thank you for your quick response. I will definitely read your white paper and post here all new insights. I must ask you for a little patience because I have two days packed with workshops ahead of me.
I am looking forward to meet you in Cologne during the NoSQL matters.
Best,
martin
I have read your white paper with great interest. Here is my understanding.
If we have a distributed system (like the example in my blog post), there is a partition (in the sense of a network/communication failure, not just the failure of machines/nodes) , and we want to keep consistency, then some clients will not be able to execute some of their requests.
There is no disagreement about that. You yourself write in your white paper: “… For any client communicating only with A, the database is down. …”.
There is also no disagreement that other clients can still execute their requests. You write: “… For these clients, the database will remain available for reads and writes…”.
So there is no different opinion about the situation: during a partition some clients cannot execute all their requests while others may remain able to.
The only difference is how to call this situation. In other articles about the CAP theorem this is called “not available” or “restricted availability”. You call it “overall database availability” or “system availability”.
One could argue that this is mostly a matter of “is the glass half empty or half full?”.
And I would agree that calling the system “not available” in this situation is problematic, because it might cause people to believe that the CAP theorem says “if anything goes wrong (server dying, network problem, …), then no client can execute any request any more”. And this is clearly not true.
But I would also argue that your formulation is not unproblematic, because it might cause people to believe that with the right database “if anything goes wrong, then the system will remain available to all clients and still remain consistent”.
The first sentence from you white paper “A database can provide strong consistency and system availability during network partitions.” certainly seems to imply this (until one realizes later on that the “system availability” is given as long as some clients can execute their requests).
Furthermore your white paper and blog article do not make it clear that there are applications for which “restricted availability” or “system availability” in case of a partition is not the optimal solution. This is the main point I was trying to make in my blog post.
Of course such applications, that want all clients to remain able to execute all their requests in all cases (even when a partition happens), must sometimes allow the system to become inconsistent. This is the consequence of the CAP theorem. And they must fix the inconsistencies when the network partition is over. And this is where “eventual consistency” comes in handy, because it means that the database will basically heal itself.
Of course, all this is not just binary “a database is either consistent or remains available”. There are many possible designs with many different trade-offs how to deal with this situation. For more in depth information about those the following articles are a first step.
Werner Vogels discusses (among many other things) the various shades of “consistency”: “Amazon Dynamo” http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html.
Daniel Abadi discusses (again among other things) the correlations between the behaviour of the database in the case of a partition and the normal operation: “Problems with CAP” http://dbmsmusings.blogspot.de/2010/04/problems-with-cap-and-yahoos-little.html
And Eric Brewer addresses a huge lot of issues in his tour-de-force: “CAP twelve years later” http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
Best regards,
martin
a bit off topic.. though any thoughts on CockroachDB? It’s a new open source Spanner project
http://www.wired.com/2014/07/cockroachdb/