NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence

Defining Linearizability and Serializability

Peter Bailis provides definitions for linearizability and serializability in plain Englihs:

One of the reasons these definitions are so confusing is that linearizability hails from the distributed systems and concurrent programming communities, and serializability comes from the database community. Today, almost everyone uses both distributed systems and databases, which often leads to overloaded terminology (e.g., “consistency,” “atomicity”).

Original title and link: Defining Linearizability and Serializability (NoSQL database©myNoSQL)


Leslie Lamport on Distributed Systems

Something you really do not want to miss: the SE-Radio interview with Leslie Lamport on Distributed systems:

The interview begins with a definition: a distributed system is a multiprocessor system in which the time required for interprocess communication is large compared to the time for events within a single processor–in other words, it takes longer for interprocess communication than it does for a process to look at its own memory. Alternatively, a distributed system is one in which processors communicate by sending messages. Leslie goes on to talk about how he became interested in distributed systems, and describes the story behind his paper about the Paxos algorithm. The goal of Paxos is to maintain consensus in an environment with unexpected faults (otherwise known as Byzantine faults). After the discussion of Paxos, Jeff asks Leslie about his recent talk “Thinking for Programmers,” which emphasizes the benefit of having a specification prior to writing actual code. “Specification” can mean a variety of things, but predicates and next-state relationships provide a mathematical rigor that is well-suited to distributed and concurrent systems. The conversation concludes with Jeff asking Leslie about how a programmer can build the mental resolve to work through a difficult problem.

Original title and link: Leslie Lamport on Distributed Systems (NoSQL database©myNoSQL)


Don't Settle for Eventual Consistency - Causal consistency

An ACM article on consistency and replication:

Causal consistency ensures that operations appear in the order the user intuitively expects. More precisely, it enforces a partial order over operations that agrees with the notion of potential causality. If operation A happens before operation B, then any data center that sees operation B must see operation A first.

Three rules define potential causality:

  1. Thread of execution. If a and b are two operations in a single thread of execution, then a -> b if operation a happens before operation b.

  2. Reads-from. If a is a write operation and b is a read operation that returns the value written by a, then a -> b.

  3. Transitivity. For operations a, b, and c, if a -> b and b -> c, then a -> c. Thus, the causal relationship between operations is the transitive closure of the first two rules.

Causal consistency ensures that operations appear in an order that agrees with these rules. This makes users happy because their operations are applied everywhere in the order they intended. It makes programmers happy because they no longer have to reason about out-of-order operations.

I’ve read this article at a small hour and I’m still digesting it, but for now my impression is that it describes a single-client ordering of operations:

  1. Is it always necessary to have serializability of single thread operations? I think this is useful only if there is a non-empty intersection of the data sets affected
  2. For “Read-from” rule there must be more assumptions (e.g. both operations are originating from the same client)
  3. What’s the connection between these rules (dictating single client ordering of operations) and replication?

Original title and link: Don’t Settle for Eventual Consistency - Causal consistency (NoSQL database©myNoSQL)


A simple distributed algorithm for small idempotent information

In this blog post I’m going to describe a very simple distributed algorithm that is useful in different programming scenarios. The algorithm is useful when you need to take some kind of information synchronized among a number of processes. The information can be everything as long as it is composed of a small number of bytes, and as long as it is idempotent, that is, the current value of the information does not depend on the previous value, and we can just replace an old value, with the new one.

While reading this post from Salvatore Sanfilippo all I was visualizing were the diagrams in James Micken’s “The saddest moment” paper.

Original title and link: A simple distributed algorithm for small idempotent information (NoSQL database©myNoSQL)


Watching a presentation on Byzantine fault tolerance is similar to watching a foreign film

James Mickens in “The saddest moment“:

In conclusion, I think that humanity should stop publishing papers about Byzantine fault tolerance. I do not blame my fellow researchers for trying to publish in this area, in the same limited sense that I do not blame crackheads for wanting to acquire and then consume cocaine. The desire to make systems more reliable is a powerful one; unfortunately, this addiction, if left unchecked, will inescapably lead to madness and/or tech reports that contain 167 pages of diagrams and proofs. Even if we break the will of the machines with formalism and cryptography, we will never be able to put Ted inside of an encrypted, nested log, and while the datacenter burns and we frantically call Ted’s pager, we will realize that Ted has already left for the cafeteria.

One of the shortest and delightful articles about the complexity of distributed systems.

Busy weekend ahead - Ricon West videos available

Ricon and QCon are the 2 conferences I regret most for missing this fall. Luckly both of them are publishing videos of the talks, so I’m posting this as early as possible on Saturday morning so you’ll have a busy weekend.

All videos from Ricon West 2013 are on YouTube1. Here’s my short playlist:

  1. Michael Bernstein: Distributed Systems Archeology. I watched this one live, but I’ll rewatch it as it’s both extremely interesting and entertaining.
  2. Peter Bailis: Bad As I Wanna Be: Coordination and Consistency in Distributed Databases and Diego Ongaro: The Raft Consensus Algorithm. These are 2 guys we’ll (continue) to hear a lot in the space of distributed systems. Even if their field is consistency and transactions.
  3. Jason Brown: Dynamic Dynamos: Comparing Riak and Cassandra. High availability? These are basically your options.
  4. Jordan West: Controlled Epidemics: Riak’s New Gossip Protocol and Metadata Store and Joseph Blomstedt: Bringing Consistency to Riak (Part 2). What Basho’s is working on.
  5. Jeff Hodges: Practicalities of Productionizing Distributed Systems. Experiences from the field.

I just wish , for my marriage’s sake, these would have been available during the long Thanksgiving weekend.

  1. Baho’s posts about the videos and their recommendations are here and here

Original title and link: Busy weekend ahead - Ricon West videos available (NoSQL database©myNoSQL)

Distributed systems for fun and profit

A single page, long, but great read about distributed systems:

In this text I’ve tried to provide a more accessible introduction to distributed systems. To me, that means two things: introducing the key concepts that you will need in order to have a good time reading more serious texts, and providing a narrative that covers things in enough detail that you get a gist of what’s going on without getting stuck on details. It’s 2013, you’ve got the Internet, and you can selectively read more about the topics you find most interesting.

Original title and link: Distributed systems for fun and profit (NoSQL database©myNoSQL)


Paxos serialization, serializability, and proactive serialization

Professor Murat Demirbas has a (short) post looking at the Paxos serialization, comparing it with serializability and then introducing the notion of proactive serialization:

In fact Paxos serialization is overkill, it is too strong. Paxos will serialize operations in a total order, which is not necessarily needed for sequential consistency. Today in many applications where knowing the total order and replicated-logging of that order is not important, Paxos is still (ab)used.

Indeed the post doesn’t offer too many details about proactive serialization, but while thinking about it here were my first questions:

  1. what would be the behavior of the system for the cases where the prediction for locks is incorrect? Somehow the behavior of the system would need to account for both false positives and false negatives.
  2. would a system using proactive serialization still need a coordinator? A master-service? (nb: if I’m reading the post correctly, it seems that the system would rely on a lock-service master)
  3. if there isn’t a coordinator who would make sure the locks are released when failures occur?

Original title and link: Paxos serialization, serializability, and proactive serialization (NoSQL database©myNoSQL)


Ricon East Videos - Talks about distributed systems and Riak

This is the weekend of videos… I’ve already posted Cassandra Summit’s Bests and Top 5 Presentations from MongoNYC.

Basho has published the majority of the presentations from their RICON East 2013 event. I’ve been lucky to be at Ricon West 2012 and it was a fantastic conference. So I think you’ll really enjoy some of videos.

Here’s the list I would start with:

  • Automatically Scalable Computation by Herchel Smith
  • ZooKeeper for the skeptical architect by Camille Fournier
  • Optimizing LevelDB for performance and scale by Matthew Von-Maszewki

Original title and link: Ricon East Videos - Talks about distributed systems and Riak (NoSQL database©myNoSQL)

And the prize goes to… FoundationDB Fault Tolerance Demo

Best. Demo. Ever.

Some details about the demo and the setup can be found in the blog post and in the HN thread.

Original title and link: And the prize goes to… FoundationDB Fault Tolerance Demo (NoSQL database©myNoSQL)

Assorted distributed databases readings by Peter Bailis

If Peter Bailis thought I’ll miss this list… well, I didn’t. It’s a lot to read, so let’s get to it:

Original title and link: Assorted distributed databases readings by Peter Bailis (NoSQL database©myNoSQL)

Video: Advancing Distributed Systems by Eric Brewer

This is the type of presentation you don’t want to miss even if it’s a couple of months old1: