NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence

MongoDB Fault Tolerance - Broken by Design

Emin Gün Sirer:

So, MongoDB is broken, and not just superficially so; it’s broken by design. If you’re relying on its fault tolerance,a you’re probably using it wrong.

He’s not the first to write about some of the critical issues in MongoDB. Every once in a while, there has been at least one post detailing one of these. But none of them stopped MongoDB’s adoption. Why? My only explanations are that:

  1. these posts are not reaching enough people;
  2. MongoDB is used mostly for simple scenarios where the occurance of such errors can be ignored
  3. those using MongoDB for more complicated scenarios have developed internal, application specific workarounds

Original title and link: MongoDB Fault Tolerance - Broken by Design (NoSQL database©myNoSQL)


Notes on Distributed Systems for Young Bloods

Jeff Hodges:

Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.

Go through this list. Twice. Then for every item ask “why is this” at least 3 times. It won’t make you a better system engineer, but you’ll understand better what people developing or operating large scale systems are facing.

Original title and link: Notes on Distributed Systems for Young Bloods (NoSQL database©myNoSQL)


Alex Sicular's Recap of Ricon 2012, a Distributed Systems Conference for Developers

While in conference mode1 I’m like a sponge, I’m almost no good at putting all my chaotic notes in a format that is usable to anyone else.

Alex Siculars has done a great job writing down his thoughts about Basho’s fantastic Ricon 2012 and linking to his post makes me feel less guilty for not being able to post mines—I’m learning to get better for the next events:

Chatter by conference attendees left me convinced that Ricon was a success. Ricon was-well executed, well-attended and actually interesting. But more importantly, it was relevant. For those of us at the conference, we actually work in this space. We are interested in the ongoing development of distributed solutions to a number of problems. The conference delivered on creating a space that brought us together to share solutions and learn about continuing advancements. For a new conference to have a successful maiden voyage is no small feat in my book. I, for one, am looking forward to the next one.

My only contribution to Alex Sicular’s great recap is to provide some links to the talks his blog post refers to:

Joe Hellerstein: Programming Principles for a Distributed Era

The PDF can be downloaded from here

Eric Brewer: Advancing Distributed Systems

Russel Brown and Sean Cribbs: Data Structures in Riak

Bryan Fink: Riak Pipe: Distributed Processing System

Ryan Zezeski: Yokozuna: Riak + Solr

More presentation slides can be found on the official Ricon 2012 site.

  1. My thanks again to the Basho team for inviting me to Ricon 2012 and also to DataStax team for the Cassandra Summit invitation. 

Original title and link: Alex Sicular’s Recap of Ricon 2012, a Distributed Systems Conference for Developers (NoSQL database©myNoSQL)


Doing Redundant Work to Speed Up Distributed Queries

Great post by Peter Bailis looking at how some systems are reducing tail latency by distributing reads across nodes:

Open-source Dynamo-style stores have different answers. Apache Cassandra originally sent reads to all replicas, but CASSANDRA-930 and CASSANDRA-982 changed this: one commenter argued that “in IO overloaded situations” it was better to send read requests only to the minimum number of replicas. By default, Cassandra now sends reads to the minimum number of replicas 90% of the time and to all replicas 10% of the time, primarily for consistency purposes. (Surprisingly, the relevant JIRA issues don’t even mention the latency impact.) LinkedIn’s Voldemort also uses a send-to-minimum strategy (and has evidently done so since it was open-sourced). In contrast, Basho Riak chooses the “true” Dynamo-style send-to-all read policy.

Original title and link: Doing Redundant Work to Speed Up Distributed Queries (NoSQL database©myNoSQL)


Dynamic Reconfiguration of Apache ZooKeeper

A slide deck by Alexander Shraer providing a shorter version of the Dynamic Reconfiguration of Primary/Backup Clusters in Apache ZooKeeper paper.

There are still some scenarios where the proposal algorithm will not work, but I cannot tell how often these will occur:

  1. Quorum of new ensemble must be in sync
  2. Another reconfig in progress
  3. Version condition check fails

One of the most interesting slides in the deck is the one explaining the failure free flow:

ZooKeeper Failure Free Flow

The Behavior of EC2/EBS Metadata Replicated Datastore

The Amazon post about the service disruption that happened late last month provides an interesting description of the behavior of the Amazon EC2 and EBS metadata datastores:

The EC2 and EBS APIs are implemented on multi-Availability Zone replicated datastores. These datastores are used to store metadata for resources such as instances, volumes, and snapshots. To protect against datastore corruption, currently when the primary copy loses power, the system automatically flips to a read-only mode in the other Availability Zones until power is restored to the affected Availability Zone or until we determine it is safe to promote another copy to primary.

Original title and link: The Behavior of EC2/EBS Metadata Replicated Datastore (NoSQL database©myNoSQL)


Dynamic Reconfiguration of Primary/Backup Clusters in Apache ZooKeeper

A paper by Alexander Shraer, Benjamin Reed, Flavio Junqueira (Yahoo) and Dahlia Malkhi (Microsoft):

Dynamically changing (reconfiguring) the membership of a replicated distributed system while preserving data consistency and system availability is a challenging problem. In this paper, we show that reconfiguration can be simplified by taking advantage of certain properties commonly provided by Primary/Backup systems. We describe a new reconfiguration protocol, recently implemented in Apache Zookeeper. It fully automates configuration changes and minimizes any interruption in service to clients while maintaining data consistency. By leveraging the properties already provided by Zookeeper our protocol is considerably simpler than state of the art.

The corresponding ZooKeeper issue has been created in 2008 and the new protocol should be part of ZooKeeper 3.5.0

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

Google’s paper about their large-scale distributed systems tracing solution Dapper which inspired Twitter’s Zipkin:

Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie [3] and X-Trace [12], but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.

Download or read the paper after the break.

Tracing Distributed Systems With Twitter Zipkin

Whenever a request reaches Twitter, we decide if the request should be sampled. We attach a few lightweight trace identifiers and pass them along to all the services used in that request. By only sampling a portion of all the requests we reduce the overhead of tracing, allowing us to always have it enabled in production.

The Zipkin collector receives the data via Scribe and stores it in Cassandra along with a few indexes. The indexes are used by the Zipkin query daemon to find interesting traces to display in the web UI.

There a many APM solutions out there, but sometimes the overhead or the lack of support for specific components may lead to the need of custom solutions like this one. While in a different field, this is just another example of why products and services should be designed with openness and integration in mind.

Original title and link: Tracing Distributed Systems With Twitter Zipkin (NoSQL database©myNoSQL)


Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems?

Jared Wray enumerates the following 4 rules for High Availability :

  • No Single Point of failure
  • Self-healing is Required
  • It will go down so plan on it
  • It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.

I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:

High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.

Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:

The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.

Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?

Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? (NoSQL database©myNoSQL)

Wordnik: Migrating From a Monolythic Platform to Micro Services

The story of how Wordnik changed a monolithic platform to one based on Micro Services and the implications at the data layer (MongoDB):

To address this, we made a significant architectural shift. We have split our application stack into something called Micro Services — a term that I first heard from the folks at Netflix. […] This translates to the data tier as well. We have low cost servers, and they work extremely well when they stay relatively small. Make them too big and things can go sour, quickly. So from the data tier, each service gets its own data cluster. This keeps services extremely focused, compact, and fast — there’s almost no fear that some other consumer of a shared data tier is going to perform some ridiculously slow operation which craters the runtime performance. Have you ever seen what happens when a BI tool is pointed at the runtime database? This is no different.

Original title and link: Wordnik: Migrating From a Monolythic Platform to Micro Services (NoSQL database©myNoSQL)


When More Machines Equals Worse Results

Galileo observed how things broke if they were naively scaled up.

Google found the larger the scale the greater the impact of latency variability. When a request is implemented by work done in parallel, as is common with today’s service oriented systems, the overall response time is dominated by the long tail distribution of the parallel operations. Every response must have a consistent and low latency or the overall operation response time will be tragically slow.

Fantastic post from Todd Hoff on the (hopefully) well known truth: “the reponse time in a distributed parallel systems is the time of the slowest component“.

Original title and link: When More Machines Equals Worse Results (NoSQL database©myNoSQL)