NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence

A simple distributed algorithm for small idempotent information

In this blog post I’m going to describe a very simple distributed algorithm that is useful in different programming scenarios. The algorithm is useful when you need to take some kind of information synchronized among a number of processes. The information can be everything as long as it is composed of a small number of bytes, and as long as it is idempotent, that is, the current value of the information does not depend on the previous value, and we can just replace an old value, with the new one.

While reading this post from Salvatore Sanfilippo all I was visualizing were the diagrams in James Micken’s “The saddest moment” paper.

Original title and link: A simple distributed algorithm for small idempotent information (NoSQL database©myNoSQL)


Watching a presentation on Byzantine fault tolerance is similar to watching a foreign film

James Mickens in “The saddest moment“:

In conclusion, I think that humanity should stop publishing papers about Byzantine fault tolerance. I do not blame my fellow researchers for trying to publish in this area, in the same limited sense that I do not blame crackheads for wanting to acquire and then consume cocaine. The desire to make systems more reliable is a powerful one; unfortunately, this addiction, if left unchecked, will inescapably lead to madness and/or tech reports that contain 167 pages of diagrams and proofs. Even if we break the will of the machines with formalism and cryptography, we will never be able to put Ted inside of an encrypted, nested log, and while the datacenter burns and we frantically call Ted’s pager, we will realize that Ted has already left for the cafeteria.

One of the shortest and delightful articles about the complexity of distributed systems.

Busy weekend ahead - Ricon West videos available

Ricon and QCon are the 2 conferences I regret most for missing this fall. Luckly both of them are publishing videos of the talks, so I’m posting this as early as possible on Saturday morning so you’ll have a busy weekend.

All videos from Ricon West 2013 are on YouTube1. Here’s my short playlist:

  1. Michael Bernstein: Distributed Systems Archeology. I watched this one live, but I’ll rewatch it as it’s both extremely interesting and entertaining.
  2. Peter Bailis: Bad As I Wanna Be: Coordination and Consistency in Distributed Databases and Diego Ongaro: The Raft Consensus Algorithm. These are 2 guys we’ll (continue) to hear a lot in the space of distributed systems. Even if their field is consistency and transactions.
  3. Jason Brown: Dynamic Dynamos: Comparing Riak and Cassandra. High availability? These are basically your options.
  4. Jordan West: Controlled Epidemics: Riak’s New Gossip Protocol and Metadata Store and Joseph Blomstedt: Bringing Consistency to Riak (Part 2). What Basho’s is working on.
  5. Jeff Hodges: Practicalities of Productionizing Distributed Systems. Experiences from the field.

I just wish , for my marriage’s sake, these would have been available during the long Thanksgiving weekend.

  1. Baho’s posts about the videos and their recommendations are here and here

Original title and link: Busy weekend ahead - Ricon West videos available (NoSQL database©myNoSQL)

Distributed systems for fun and profit

A single page, long, but great read about distributed systems:

In this text I’ve tried to provide a more accessible introduction to distributed systems. To me, that means two things: introducing the key concepts that you will need in order to have a good time reading more serious texts, and providing a narrative that covers things in enough detail that you get a gist of what’s going on without getting stuck on details. It’s 2013, you’ve got the Internet, and you can selectively read more about the topics you find most interesting.

Original title and link: Distributed systems for fun and profit (NoSQL database©myNoSQL)


Paxos serialization, serializability, and proactive serialization

Professor Murat Demirbas has a (short) post looking at the Paxos serialization, comparing it with serializability and then introducing the notion of proactive serialization:

In fact Paxos serialization is overkill, it is too strong. Paxos will serialize operations in a total order, which is not necessarily needed for sequential consistency. Today in many applications where knowing the total order and replicated-logging of that order is not important, Paxos is still (ab)used.

Indeed the post doesn’t offer too many details about proactive serialization, but while thinking about it here were my first questions:

  1. what would be the behavior of the system for the cases where the prediction for locks is incorrect? Somehow the behavior of the system would need to account for both false positives and false negatives.
  2. would a system using proactive serialization still need a coordinator? A master-service? (nb: if I’m reading the post correctly, it seems that the system would rely on a lock-service master)
  3. if there isn’t a coordinator who would make sure the locks are released when failures occur?

Original title and link: Paxos serialization, serializability, and proactive serialization (NoSQL database©myNoSQL)


Ricon East Videos - Talks about distributed systems and Riak

This is the weekend of videos… I’ve already posted Cassandra Summit’s Bests and Top 5 Presentations from MongoNYC.

Basho has published the majority of the presentations from their RICON East 2013 event. I’ve been lucky to be at Ricon West 2012 and it was a fantastic conference. So I think you’ll really enjoy some of videos.

Here’s the list I would start with:

  • Automatically Scalable Computation by Herchel Smith
  • ZooKeeper for the skeptical architect by Camille Fournier
  • Optimizing LevelDB for performance and scale by Matthew Von-Maszewki

Original title and link: Ricon East Videos - Talks about distributed systems and Riak (NoSQL database©myNoSQL)

And the prize goes to… FoundationDB Fault Tolerance Demo

Best. Demo. Ever.

Some details about the demo and the setup can be found in the blog post and in the HN thread.

Original title and link: And the prize goes to… FoundationDB Fault Tolerance Demo (NoSQL database©myNoSQL)

Assorted distributed databases readings by Peter Bailis

If Peter Bailis thought I’ll miss this list… well, I didn’t. It’s a lot to read, so let’s get to it:

Original title and link: Assorted distributed databases readings by Peter Bailis (NoSQL database©myNoSQL)

Video: Advancing Distributed Systems by Eric Brewer

This is the type of presentation you don’t want to miss even if it’s a couple of months old1:

Battle-Test Your MongoDB Cluster

Kristina Chodorow1 shared a good list of tests to put a MongoDB cluster through:

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to fix it.

Netflix is using a series of tools that perform similar tests against their Cassandra clusters. With a small twist: they are run against the production clusters.

  1. In a recent post, Kristina Chodorow, one of the most prominent figures of the MongoDB world, has announced she has decided to become a Googler. Good luck Kristina! 

Original title and link: Battle-Test Your MongoDB Cluster (NoSQL database©myNoSQL)


Using Apache ZooKeeper to Build Distributed Apps (And Why)

Great intro to ZooKeeper and the problems it can help solve by Sean Mackrory:

Done wrong, distributed software can be the most difficult to debug. Done right, however, it can allow you to process more data, more reliably and in less time. Using Apache ZooKeeper allows you to confidently reason about the state of your data, and coordinate your cluster the right way. You’ve seen how easy it is to get a ZooKeeper server up and running. (In fact, if you’re a CDH user, you may already have an ensemble running!) Think about how ZooKeeper could help you build more robust systems

Leaving aside for a second the main topic of the post, another important lesson here is that the NIH syndrom in distributed systems is very expensive.

Original title and link: Using Apache ZooKeeper to Build Distributed Apps (And Why) (NoSQL database©myNoSQL)


Introducing Highly Available Transactions: The Relationship Between CAP and ACID Transactions

Learning from Peter Bailis:

While the CAP Theorem is fairly well understood, the relationship between CAP and ACID transactions is not. If we consider the current lack of highly available systems providing arbitrary multi-object operations with ACID-like semantics, it appears that CAP and transactions are incompatible. This is partly due to the historical design of distributed database systems, which typically chose consistency over high availability. Standard database techniques like two-phase locking and multi-version concurrency control do not typically perform well in the event of partial failure, and the master-based (i.e., master-per-shard) and overlapping quorum-based techniques often adopted by many distributed database designs are similarly unavailable if users are partitioned from the anointed primary copies.

There’s also a paper (PDF) authored by Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph m. Hellerstein, Ion Stoica. These names should tell you something.

Original title and link: Introducing Highly Available Transactions: The Relationship Between CAP and ACID Transactions (NoSQL database©myNoSQL)