ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

CAP: All content tagged as CAP in NoSQL databases and polyglot persistence

Clustrix Sierra Clustered Database Engine

In the light of publicly announcing customers, I wanted to read a bit about Clustrix Clustered Database Systems.

The company homepage is describing the product:

  • scalable database appliaces for Internet-scale work loads
  • Linearly Scalable: fully distributed, parallel architecture provides unlimited scale
  • SQL functionality: full SQL relational and data consistency (ACID) functionality
  • Fault-Tolerant: highly available providing fail-over, recovery, and self-healing
  • MySQL Compatible: seamless deployment without application changes.

All these sounded pretty (too) good. And I’ve seen a very similar presentation for Xeround: Elastic, Always-on Storage Engine for MySQL.

So, I’ve continued my reading with the Sierra Clustered Database Engine whitepaper (PDF).

Here are my notes:

  • Sierra is composed of:
    • database personality module: translates queries into internal representation
    • distributed query planner and compiler
    • distributed shared-nothing execution engine
    • persistent storage
    • NVRAM transactional storage for journal changes
    • inter-node Infiniband
  • queries are decomposed into query fragments which are the unit of work. Query fragments are sent for execution to nodes containing the data.
  • query fragments are atomic operations that can:
    • insert, read, update data
    • execute functions and modify control flow
    • perform synchronization
    • send data to other nodes
    • format output
  • query fragments can be executed in parallel
  • query fragments can be cached with parameterized constants at the node level
  • determining where to sent the query fragments for execution is done using either range-based rules or hash function
  • tables are partitioned into slices, each slice having redundancy replicas
    • size of slices can be automatically determined or configured
    • adding new nodes to the cluster results in rebalancing slices
    • slices contained on a failed device are reconstructed using their replicas
  • one of the slices is considered primary
  • writes go to all replicas and are transactional
  • all reads fo the the slice primary

The paper also exemplifies the execution of 4 different queries:

SELECT * FROM T1 WHERE uid=10

SELECT uid, name FROM T1 JOIN T2 on T1.gid = T2.gid WHERE uid=10

SELECT * FROM T1 WHERE uid<100 and gid>10 ORDER BY uid LIMIT 5

INSERT INTO T1 VALUES (10,20)

Questions:

  • who is coordinating transactions that may be executed on different nodes?
  • who is maintains the topology of the slices? In case of a node failure, you’d need to determine:
    1. what slices where on the failing node
    2. where are the replicas for each of these slices
    3. where new replicas will be created
    4. when will new replicas become available for writes
  • who elects the slice primary?

Clustrix Sierra Clustered Database Engine

Original title and link: Clustrix Sierra Clustered Database Engine (NoSQL databases © myNoSQL)


Data Consistency and Synchronization in Scalable NoSQL Solutions

On Xeround blog:

What would be sacrificed if synchronization and transaction mechanisms were added on top of NoSQL foundations?  The answer is performance. Distributed and consistent data services will always be slower than pure raw distributed data service.

… and availability.

Original title and link: Data Consistency and Synchronization in Scalable NoSQL Solutions (NoSQL databases © myNoSQL)

via: http://blog.xeround.com/2010/12/bending-the-rules-for-sql


Google Megastore: Scalable, Highly Available Storage for Interactive Services

A new paper from Google:

Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability.
We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.
This paper describes Megastore’s semantics and replication algorithm.

Megastore seems to be the solution behind the Google App Engine high replication datastore.

Emphases are mine.

Original title and link: Google Megastore: Scalable, Highly Available Storage for Interactive Services (NoSQL databases © myNoSQL)

via: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf


Google App Engine High Replication Datastore

The High Replication Datastore provides the highest level of availability for your reads and writes, at the cost of increased latency for writes and changes in consistency guarantees in the API. The High Replication Datastore increases the number of data centers that maintain replicas of your data by using the Paxos algorithm to synchronize that data across datacenters in real time.

Still not completely decentralized a la Amazon Dynamo.

Original title and link: Google App Engine High Replication Datastore (NoSQL databases © myNoSQL)

via: http://googleappengine.blogspot.com/2011/01/announcing-high-replication-datastore.html


Errors in Database Systems Still Must Consider Network Partitions

Dan Weinreb on Michael Stonebraker’s network partition scenarios from the overly discussed paper about ☞ Erros in Database Systems, Eventual Consistency, and the CAP Theorem :

If you mean a few computers connected together by an Ethernet, with redundant hardware all around, then the chance of a failure of the network itself is relatively low. However, real-world data centers with a relatively large number of servers rarely work this way. The problem is that a real network is very complicated. It depends on switches at both level 3 (routers) and level 2 (hubs). Situations can arise in which pieces of the network are mis-configured by accident; these can be hard to find due, ironically, the very redundancy that was added to avoid failures. In particular, there is no way to make any kind of guarantee about the latency within the network, nor the likelihood that a packet will make it from its source to its destination.

John Hugg (VoltDB) made an interesting comment:

Step 1: Figure out how much the network partitions that will cause your system to be unavailable will cost.

Step 2: Figure out how much giving up consistency will cost.

Step 3: Make your CAP decision base on these estimated values.

@justinsheehy

Original title and link: Errors in Database Systems Still Must Consider Network Partitions (NoSQL databases © myNoSQL)

via: http://danweinreb.org/blog/errors-in-database-systems-still-must-consider-network-partitions


CAP Continued: Someone is Wrong on the Internet

Once again closing the circle on this last week’s CAP theorem discussions, Jeff Darcy comments on Stonebraker’s clarifications on the CAP theorem:

In my experience, network partitions do not happen often. Specifically, they occur less frequently than the sum of bohrbugs, application errors, human errors and reprovisioning events. So it doesn’t much matter what you do when confronted with network partitions. Surviving them will not “move the needle” on availability because higher frequency events will cause global outages. Hence, you are giving up something (consistency) and getting nothing in return.


So, because network partitions occur less than some other kind of error, we shouldn’t worry about them? Because more people die in cars than in planes, we shouldn’t try to make planes safer? Also, notice how he says that network partitions are rare in his experience. His experience may be vast, but much of it is irrelevant because the scale and characteristics of networks nowadays are unlike those of even five years ago. People with more recent experience at higher scale seem to believe that network partitions are an important issue, and claiming that partitions are rare in (increasingly common) multi-datacenter environments is just ridiculous. Based on all this plus my own experience, I think dealing with network partitions does “move the needle” on availability and is hardly “nothing in return” at all. Sure, being always prepared for a partition carries a cost, but so does the alternative and that’s the whole point of CAP.

Original title and link: CAP Continued: Someone is Wrong on the Internet (NoSQL databases © myNoSQL)

via: http://pl.atyp.us/wordpress/?p=3110


MongoDB and CAP Theorem

In the light of this week’s topic, the CAP theorem, the weekend video is… MongoDB and CAP theorem:

Original title and link: MongoDB and CAP Theorem (NoSQL databases © myNoSQL)


The CAP Theorem… Again

Today looks to be (again) the day of the CAP theorem[1][2], so let’s do a quick summary:

  1. We had Coda Hale’s ☞ You can’t sacrifice partition tolerance:

    Of the CAP theorem’s Consistency, Availability, and Partition Tolerance, Partition Tolerance is mandatory in distributed systems. You cannot not choose it. Instead of CAP, you should think about your availability in terms of yield (percent of requests answered successfully) and harvest (percent of required data actually included in the responses) and which of these two your system will sacrifice when failures happen.

  2. Jeff Darcy followed up with ☞ Another CAP article:

    It seems to me that there is a consensus emerging. Even if Gilbert and Lynch only formally proved a narrower version of Brewer’s original conjecture, that conjecture and the tradeoffs it implies are still alive and well and highly relevant to the design of real working systems that serve real business needs.

    and ☞ Reactions to Coda’s CAP post:

    The last point is whether CAP really boils down to “two out of three” or not. Of course not, even though I’ve probably said that myself a couple of times. The reason is merely pedagogical. It’s a pretty good approximation, much like teaching Newtonian physics or ideal gases in chemistry. You have to get people to understand the basic shape of things before you start talking about the exceptions and special cases, and “two out of three” is a good approximation. Sure, you can trade off just a little of one for a little of another instead of purely either/or, but only after you thoroughly understand and appreciate why the simpler form doesn’t suffice. The last thing we need is people with learner’s permits trying to build exotic race cars. They just give the doubters and trolls more ammunition with which to suppress innovation.

  3. Henry Robinson’s ☞ CAP Confusion: Problems with ‘partition tolerance’ popped up too:

    Not ‘choosing’ P is analogous to building a network that will never experience multiple correlated failures. This is unreasonable for a distributed system – precisely for all the valid reasons that are laid out in the CACM post about correlated failures, OS bugs and cluster disasters – so what a designer has to do is to decide between maintaining consistency and availability. Dr. Stonebraker tells us to choose consistency, in fact, because availability will unavoidably be impacted by large failure incidents. This is a legitimate design choice, and one that the traditional RDBMS lineage of systems has explored to its fullest, but it implicitly protects us neither from availability problems stemming from smaller failure incidents, nor from the high cost of maintaining sequential consistency.

  4. Many of the above articles were referring to Michael Stonebraker’s ☞ Errors in Database Systems, Eventual Consistency, and the CAP Theorem:

    In summary, one should not throw out the C so quickly, since there are real error scenarios where CAP does not apply and it seems like a bad tradeoff in many of the other situations.

So, we pretty much went full circle. I just hope that Eric Brewer will do ☞ follow up:

I really need to write an updated CAP theorem paper


  1. Michael Stonebraker’s clarifications on the CAP theorem and the older but related Daniel Abadi’s Problems with CAP  ()
  2. Nati Shalom’s ☞ NoCAP and my own NoCAP… is wrong (nb make sure you also read the comments)  ()

Original title and link: The CAP Theorem… Again (NoSQL databases © myNoSQL)


Problems with CAP

Michael Stonebraker’s clarifications on the CAP theorem reminded me of Daniel Abadi’s PACELC:

To me, CAP should really be PACELC – if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?

Original title and link: Problems with CAP (NoSQL databases © myNoSQL)

via: http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html


Clarifications on the CAP Theorem and Data-Related Errors

Michael Stonebraker detailing his perspective on CAP theorem:

In my experience, network partitions do not happen often.  Specifically, they occur less frequently than the sum of bohrbugs, application errors, human errors and reprovisioning events.  So it doesn’t much matter what you do when confronted with network partitions.  Surviving them will not “move the needle” on availability because higher frequency events will cause global outages.  Hence, you are giving up something (consistency) and getting nothing in return. 

It is difficult to argue against the idea that in an environment where your application and devops and bugs kill your data storage more often than system and network failures would, the CAP theorem remains one of your system priorities.

Original title and link: Clarifications on the CAP Theorem and Data-Related Errors (NoSQL databases © myNoSQL)

via: http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors


NoCAP... Is Wrong

Nati Shalom (Gigaspaces):

The diagram below illustrates one of the examples by which we could achieve write scalability and throughput without compromising on consistency.

No CAP

As with the previous examples we break our data into partitions to handle our write scaling between nodes. To achieve high throughput we use in-memory storage instead of disk. As in-memory device tend to be significantly faster and concurrent then disk and since network speed is no longer a bottleneck we can achieve high throughput and low latency even when we use synchronous write to the replica.

Unfortunately, a couple of things are wrong:

  1. CAP theorem is not about write scalability. If we replace consistency, availability, or partition tolerance with another characteristic we will have to talk about a different theorem.
  2. Network speed is a bottleneck for cross data center communications. Plus it can go down.
  3. Considering the diagram below, with a partition in the network between the 2 in memory data grid nodes, please tell me how can you achieve maintain both consistency and availability. No CAP

My conclusion is simple: whatever solution/product/vendor you’re going to use the CAP theorem stands for all distributed systems.

Original title and link: NoCAP… Is Wrong (NoSQL databases © myNoSQL)

via: http://blog.gigaspaces.com/2010/10/15/nocap/


MySQL and MongoDB Sitting In a Boat

An interesting post from lunar logic guys about using MySQL and MongoDB for their Kanban product, how that get there and the tools they are using.

As a personal note, I thought how this system would be characterized in terms of CAP. It should be quite clear that we cannot speak about consistency over the two systems as MongoDB doesn’t really support transactions (you can check these notes on MongoDB for more details). So, in case their system would be using master-master MySQL replication and replica-pairs for MongoDB, and the internal tools would know how to work with this setup, we could probably say that we have an AP system. But if any of these preconditions are not fulfilled, I’d say both A and P are lost.

via: http://lunarlogicpolska.com/blog/2010/02/15/mysql-and-mongodb-working-together-in-kanbanery