NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence

Battle-Test Your MongoDB Cluster

Kristina Chodorow1 shared a good list of tests to put a MongoDB cluster through:

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to fix it.

Netflix is using a series of tools that perform similar tests against their Cassandra clusters. With a small twist: they are run against the production clusters.

  1. In a recent post, Kristina Chodorow, one of the most prominent figures of the MongoDB world, has announced she has decided to become a Googler. Good luck Kristina! 

Original title and link: Battle-Test Your MongoDB Cluster (NoSQL database©myNoSQL)


Using Apache ZooKeeper to Build Distributed Apps (And Why)

Great intro to ZooKeeper and the problems it can help solve by Sean Mackrory:

Done wrong, distributed software can be the most difficult to debug. Done right, however, it can allow you to process more data, more reliably and in less time. Using Apache ZooKeeper allows you to confidently reason about the state of your data, and coordinate your cluster the right way. You’ve seen how easy it is to get a ZooKeeper server up and running. (In fact, if you’re a CDH user, you may already have an ensemble running!) Think about how ZooKeeper could help you build more robust systems

Leaving aside for a second the main topic of the post, another important lesson here is that the NIH syndrom in distributed systems is very expensive.

Original title and link: Using Apache ZooKeeper to Build Distributed Apps (And Why) (NoSQL database©myNoSQL)


Introducing Highly Available Transactions: The Relationship Between CAP and ACID Transactions

Learning from Peter Bailis:

While the CAP Theorem is fairly well understood, the relationship between CAP and ACID transactions is not. If we consider the current lack of highly available systems providing arbitrary multi-object operations with ACID-like semantics, it appears that CAP and transactions are incompatible. This is partly due to the historical design of distributed database systems, which typically chose consistency over high availability. Standard database techniques like two-phase locking and multi-version concurrency control do not typically perform well in the event of partial failure, and the master-based (i.e., master-per-shard) and overlapping quorum-based techniques often adopted by many distributed database designs are similarly unavailable if users are partitioned from the anointed primary copies.

There’s also a paper (PDF) authored by Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph m. Hellerstein, Ion Stoica. These names should tell you something.

Original title and link: Introducing Highly Available Transactions: The Relationship Between CAP and ACID Transactions (NoSQL database©myNoSQL)


MongoDB Fault Tolerance - Broken by Design

Emin Gün Sirer:

So, MongoDB is broken, and not just superficially so; it’s broken by design. If you’re relying on its fault tolerance,a you’re probably using it wrong.

He’s not the first to write about some of the critical issues in MongoDB. Every once in a while, there has been at least one post detailing one of these. But none of them stopped MongoDB’s adoption. Why? My only explanations are that:

  1. these posts are not reaching enough people;
  2. MongoDB is used mostly for simple scenarios where the occurance of such errors can be ignored
  3. those using MongoDB for more complicated scenarios have developed internal, application specific workarounds

Original title and link: MongoDB Fault Tolerance - Broken by Design (NoSQL database©myNoSQL)


Notes on Distributed Systems for Young Bloods

Jeff Hodges:

Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.

Go through this list. Twice. Then for every item ask “why is this” at least 3 times. It won’t make you a better system engineer, but you’ll understand better what people developing or operating large scale systems are facing.

Original title and link: Notes on Distributed Systems for Young Bloods (NoSQL database©myNoSQL)


Alex Sicular's Recap of Ricon 2012, a Distributed Systems Conference for Developers

While in conference mode1 I’m like a sponge, I’m almost no good at putting all my chaotic notes in a format that is usable to anyone else.

Alex Siculars has done a great job writing down his thoughts about Basho’s fantastic Ricon 2012 and linking to his post makes me feel less guilty for not being able to post mines—I’m learning to get better for the next events:

Chatter by conference attendees left me convinced that Ricon was a success. Ricon was-well executed, well-attended and actually interesting. But more importantly, it was relevant. For those of us at the conference, we actually work in this space. We are interested in the ongoing development of distributed solutions to a number of problems. The conference delivered on creating a space that brought us together to share solutions and learn about continuing advancements. For a new conference to have a successful maiden voyage is no small feat in my book. I, for one, am looking forward to the next one.

My only contribution to Alex Sicular’s great recap is to provide some links to the talks his blog post refers to:

Joe Hellerstein: Programming Principles for a Distributed Era

The PDF can be downloaded from here

Eric Brewer: Advancing Distributed Systems

Russel Brown and Sean Cribbs: Data Structures in Riak

Bryan Fink: Riak Pipe: Distributed Processing System

Ryan Zezeski: Yokozuna: Riak + Solr

More presentation slides can be found on the official Ricon 2012 site.

  1. My thanks again to the Basho team for inviting me to Ricon 2012 and also to DataStax team for the Cassandra Summit invitation. 

Original title and link: Alex Sicular’s Recap of Ricon 2012, a Distributed Systems Conference for Developers (NoSQL database©myNoSQL)


Doing Redundant Work to Speed Up Distributed Queries

Great post by Peter Bailis looking at how some systems are reducing tail latency by distributing reads across nodes:

Open-source Dynamo-style stores have different answers. Apache Cassandra originally sent reads to all replicas, but CASSANDRA-930 and CASSANDRA-982 changed this: one commenter argued that “in IO overloaded situations” it was better to send read requests only to the minimum number of replicas. By default, Cassandra now sends reads to the minimum number of replicas 90% of the time and to all replicas 10% of the time, primarily for consistency purposes. (Surprisingly, the relevant JIRA issues don’t even mention the latency impact.) LinkedIn’s Voldemort also uses a send-to-minimum strategy (and has evidently done so since it was open-sourced). In contrast, Basho Riak chooses the “true” Dynamo-style send-to-all read policy.

Original title and link: Doing Redundant Work to Speed Up Distributed Queries (NoSQL database©myNoSQL)


Dynamic Reconfiguration of Apache ZooKeeper

A slide deck by Alexander Shraer providing a shorter version of the Dynamic Reconfiguration of Primary/Backup Clusters in Apache ZooKeeper paper.

There are still some scenarios where the proposal algorithm will not work, but I cannot tell how often these will occur:

  1. Quorum of new ensemble must be in sync
  2. Another reconfig in progress
  3. Version condition check fails

One of the most interesting slides in the deck is the one explaining the failure free flow:

ZooKeeper Failure Free Flow

The Behavior of EC2/EBS Metadata Replicated Datastore

The Amazon post about the service disruption that happened late last month provides an interesting description of the behavior of the Amazon EC2 and EBS metadata datastores:

The EC2 and EBS APIs are implemented on multi-Availability Zone replicated datastores. These datastores are used to store metadata for resources such as instances, volumes, and snapshots. To protect against datastore corruption, currently when the primary copy loses power, the system automatically flips to a read-only mode in the other Availability Zones until power is restored to the affected Availability Zone or until we determine it is safe to promote another copy to primary.

Original title and link: The Behavior of EC2/EBS Metadata Replicated Datastore (NoSQL database©myNoSQL)


Dynamic Reconfiguration of Primary/Backup Clusters in Apache ZooKeeper

A paper by Alexander Shraer, Benjamin Reed, Flavio Junqueira (Yahoo) and Dahlia Malkhi (Microsoft):

Dynamically changing (reconfiguring) the membership of a replicated distributed system while preserving data consistency and system availability is a challenging problem. In this paper, we show that reconfiguration can be simplified by taking advantage of certain properties commonly provided by Primary/Backup systems. We describe a new reconfiguration protocol, recently implemented in Apache Zookeeper. It fully automates configuration changes and minimizes any interruption in service to clients while maintaining data consistency. By leveraging the properties already provided by Zookeeper our protocol is considerably simpler than state of the art.

The corresponding ZooKeeper issue has been created in 2008 and the new protocol should be part of ZooKeeper 3.5.0

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

Google’s paper about their large-scale distributed systems tracing solution Dapper which inspired Twitter’s Zipkin:

Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie [3] and X-Trace [12], but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.

Download or read the paper after the break.

Tracing Distributed Systems With Twitter Zipkin

Whenever a request reaches Twitter, we decide if the request should be sampled. We attach a few lightweight trace identifiers and pass them along to all the services used in that request. By only sampling a portion of all the requests we reduce the overhead of tracing, allowing us to always have it enabled in production.

The Zipkin collector receives the data via Scribe and stores it in Cassandra along with a few indexes. The indexes are used by the Zipkin query daemon to find interesting traces to display in the web UI.

There a many APM solutions out there, but sometimes the overhead or the lack of support for specific components may lead to the need of custom solutions like this one. While in a different field, this is just another example of why products and services should be designed with openness and integration in mind.

Original title and link: Tracing Distributed Systems With Twitter Zipkin (NoSQL database©myNoSQL)