NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence

Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems?

Jared Wray enumerates the following 4 rules for High Availability :

  • No Single Point of failure
  • Self-healing is Required
  • It will go down so plan on it
  • It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.

I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:

High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.

Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:

The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.

Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?

Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? (NoSQL database©myNoSQL)

Wordnik: Migrating From a Monolythic Platform to Micro Services

The story of how Wordnik changed a monolithic platform to one based on Micro Services and the implications at the data layer (MongoDB):

To address this, we made a significant architectural shift. We have split our application stack into something called Micro Services — a term that I first heard from the folks at Netflix. […] This translates to the data tier as well. We have low cost servers, and they work extremely well when they stay relatively small. Make them too big and things can go sour, quickly. So from the data tier, each service gets its own data cluster. This keeps services extremely focused, compact, and fast — there’s almost no fear that some other consumer of a shared data tier is going to perform some ridiculously slow operation which craters the runtime performance. Have you ever seen what happens when a BI tool is pointed at the runtime database? This is no different.

Original title and link: Wordnik: Migrating From a Monolythic Platform to Micro Services (NoSQL database©myNoSQL)


When More Machines Equals Worse Results

Galileo observed how things broke if they were naively scaled up.

Google found the larger the scale the greater the impact of latency variability. When a request is implemented by work done in parallel, as is common with today’s service oriented systems, the overall response time is dominated by the long tail distribution of the parallel operations. Every response must have a consistent and low latency or the overall operation response time will be tragically slow.

Fantastic post from Todd Hoff on the (hopefully) well known truth: “the reponse time in a distributed parallel systems is the time of the slowest component“.

Original title and link: When More Machines Equals Worse Results (NoSQL database©myNoSQL)


Distributed System Reliability: It's About Operations, Not Architecture or Design

Jay Kreps1:

I have come around to the view that the real core difficulty of these systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. […] I really think there is really only one thing to talk about with respect to reliability: continuous hours of successful production operations.

  1. Jay Kreps: works for LinkedIn where he is the technical lead for the SNA team that handles search, social graph, data infrastructure, and recommendation systems. 

Original title and link: Distributed System Reliability: It’s About Operations, Not Architecture or Design (NoSQL database©myNoSQL)


Visualizing System Latency

Besides the many practical lessons emphasized in Jack Clark’s interview with Adrian Cockcroft on ZDNet—luckly I’ve had the chance to see some of Cockcroft’s presentations about Netflix architecture and also talk to him directly—one thing that sticked with me was the ending paragraph:

The thing I’ve been publicly asking for has been better IO in the cloud. Obviously I want SSDs in there. We’ve been asking cloud vendors to do that for a while. With Cassandra, we’ve had to go onto horizontal scale and use the internal disks and triple replicate across availability zones, so you end up with a triple-redundant data store that is careful not to overload the disks.

That reminded me of this old ACM article authored by Brendan Gregg:

When I/O latency is presented as a visual heat map, some intriguing and beautiful patterns can emerge. These patterns provide insight into how a system is actually performing and what kinds of latency end-user applications experience. Many characteristics seen in these patterns are still not understood, but so far their analysis is revealing systemic behaviors that were previously unknown.

Visualizing system latency - Sequential disk reads, stepping disk count

I was wondering if in the NoSQL databases space (and data storage space in general) are there any of the monitoring tools that provide such advanced visualization of latency data. Do you know any?

Original title and link: Visualizing System Latency (NoSQL database©myNoSQL)

What Can Be Learned From Heroku Outage Postmortem

While some may learn a few new things or get a confirmation in the very details of the outage, what caught my attention in the Heroku’s postmortem analysis is the conclusions:

  • higher sensitivity and more aggressive monitoring on a variety of metrics
  • improved early warning systems
  • better containment
  • improved flow controls, both manual and automatic
  • expanding simulations of unusual load conditions in our staging environment

None of these are particular to a specific storage or NoSQL database. But they all reflect the reality of operating at large scale where even the most operationally friendly solutions—think of Dynamo-inspired NoSQL databases—cannot and should not be left unmonitored or unsupervised or with no clear recovery strategies and processes in place.

In the NoSQL world, one of the most covered outages was the MongoDB outage at Foursquare. And in case you don’t remember the details, most of the circumstances that led to that event could have been prevented by having:

  1. better monitoring
  2. early warnings
  3. better operational procedures

Aren’t these two lists looking very alike?

Original title and link: What Can Be Learned From Heroku Outage Postmortem (NoSQL database©myNoSQL)


Eventual and Strong Consistency, Sloppy and Strict Quorums, and Other Lessons and Thoughts on Distributed Systems

Anything I’d write would just steal from your time to read and think about the email Joseph Blomstedt posted to the Riak list.

Original title and link: Eventual and Strong Consistency, Sloppy and Strict Quorums, and Other Lessons and Thoughts on Distributed Systems (NoSQL database©myNoSQL)

Spotify Architecture: The Peer to Peer Network

Spotify architecture

  • Spotify’s p2p network works like a BitTorrent network to locate peers […]. It uses a proprietary protocol designed especially for streaming music.
  • There’s no “preferred” peers or supernodes, but a future improvement might be to use peer-to-peer overlays to exploit the overlap in interests between users.
  • The maximum number of peers in the network is 60, with a soft-limit of 50 peers.
  • The client uploads to at most 4 peers at a time.
  • Server-side trackers and network queries are used to locate other users who have the music you’re listening to.
  • Spotify uses TCP as the transport protocol instead of UDP, since it can take advantage of TCP’s congestion controls and ability to re-send lost packets.
  • Most users have a large cache […]. This helps keep network traffic down since most users listen to tracks more than once.
  • At Spotify’s end, there’s a master storage area (290TB) and two production storage areas (90TB in London, 90TB in Stockholm).

How many characteristics of the distributed NoSQL databases can you identify in Spotify’s architecture?

Original title and link: Spotify Architecture: The Peer to Peer Network (NoSQL database©myNoSQL)


Apache ZooKeeper 3.4.0 Released to Be Followed Soon by Production-Ready Version

Apache ZooKeeper, the high-performance coordination service exposing services like naming, configuration management, synchronization, etc. for distributed applications, has reached version 3.4.0.

Even if the official announcement was laconic, ZooKeeper 3.4.0 features over 150 fixes.

The most important ones are summarized by Patrick Hunt in this Cloudera blog post:

  • ZooKeeper 3.3.3 clients are compatible with 3.4.0 servers
  • Native Windows version of C client
  • Support Kerberos authentication of clients
  • Support Kerberos authentication of clients
  • Improved REST Interface
  • Existing monitoring support has been extended through the introduction of a new ‘mntr’ 4 letter word
  • Add tools and recipes for monitoring as a contrib
  • Web-based Administrative Interface
  • Automating log and snapshot cleaning
  • Add logging/stats to identify production deployment issues
  • Support for building RPM and DEB packages

Something to keep in mind though: ZooKeeper 3.4.0 is not production ready yet. After extensive testing, it will be followed soon by a minor release that will be production-ready.

Original title and link: Apache ZooKeeper 3.4.0 Released to Be Followed Soon by Production-Ready Version (NoSQL database©myNoSQL)

The Main Benefits of Riak

Dave Smith[1] about Riak:

Low, predictable latency data access and operational ease-of-use. When used appropriately (i.e. don’t expect Riak to be a RDBMS), Riak is able to provide an excellent latency profile, even in the case of multiple node failures. In addition, it’s easy to add/remove nodes while the system is running without a lot of operational juggling. Another useful benefit is that multiple nodes can take writes for the same key, thus making it easier to construct a geographically distributed data store.

  1. Dave Smith, Director of Engineering at Basho Technologies, Inc.  

Original title and link: The Main Benefits of Riak (NoSQL database©myNoSQL)


Zynga, Data Centers, Polyglot Persistence, and Big Data

Cadir Lee (CTO Zynga) quoted in a VentureBeat post:

It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.

Couple of thoughts:

  1. Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .
  2. Zynga’s applications are great examples of where fully distributed NoSQL databases fit. Availability is key.
  3. My answer to the question: “how many Zyngas are out there” would be: “enough to ensure some good business for the most reliable and scalable distributed databases”
  4. Zynga has contributed and is an investor in Membase, the company that merged with CouchOne to form Couchbase. But Zynga was using a custom version of Membase.
  5. Zynga also operates a large MySQL cluster.
  6. Zynga processes over 15 terabytes of game data every day (according to their SEC filing ). That’s Hadoop sweet spot.

PS: I’d love to talk to someone from Zynga about their data storage approach. If you have any connections I’d really appreciate an introduction.

Original title and link: Zynga, Data Centers, Polyglot Persistence, and Big Data (NoSQL database©myNoSQL)

Google: 'At Scale, Everything Breaks'

Urs Hölzle interviewed by ZDNet:

I think the big challenges haven’t changed that much. I’d say that it’s dealing with failure, because at scale everything breaks no matter what you do and you have to deal reasonably cleanly with that and try to hide it from the people actually using your system.

Make sure you read the whole article, especially the part about what Hölzle considers to be today’s captivating technical problems: fault tolerance, dealing with state and mutable states, using commodity hardware while dealing with rapid innovation cycles.

Original title and link: Google: ‘At Scale, Everything Breaks’ (NoSQL database©myNoSQL)