NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



riak: All content tagged as riak in NoSQL databases and polyglot persistence

Deploying Riak on EC2 - What to Pick?

Deepak Bala sharing his recommendations for running Riak on EC2 based on his own experience:

There are a couple of problems to field when deploying Riak.

  1. The EC2 instances that are provisioned by default change the following on restart.

    • Private IP address
    • Public IP address
    • Private DNS
    • Public DNS
    • EBS instances provide stable durable storage while Ephemeral storage provides for better predictable performance at the cost of losing data on restarts.
  2. Performance.

Original title and link: Deploying Riak on EC2 - What to Pick? (NoSQL database©myNoSQL)


Setting Up a Riak Dev Cluster on OS X Mountain Lion

This script might come in handy considering last time I’ve failed setting up a Riak dev cluster.

Original title and link: Setting Up a Riak Dev Cluster on OS X Mountain Lion (NoSQL database©myNoSQL)

Riak Automated Hosting on Engine Yard

Basho team is collaborating with Engine Yard to simplify Riak deployments:

With Riak on Engine Yard, you can deploy a Riak cluster as simply as defining some configuration values and clicking “Add Cluster.”

To users, ease of deployment is valuable in itself. I’d be interested thought to learn if the Engine Yard offering comes with any extra benefits compared to other automated Riak hosting solutions.

Original title and link: Riak Automated Hosting on Engine Yard (NoSQL database©myNoSQL)


Riak 2i or Key Filters: Which Is Faster?

Matt Snyder:

We had been using 2i for most of our querying and a combination of “Data Point Objects” and MapReduce for our more analytical needs. However when our MapReduce started bombing we questioned/reviewed our querying approach.


So my question to all of you is why is 2i/$key being recommended over Key Filters?

It’s a good question and I’d love to know the answer myself.

Original title and link: Riak 2i or Key Filters: Which Is Faster? (NoSQL database©myNoSQL)


Hosted Riak With Riak-On

First let me welcome Riak ON!, the first company planning to offer a hosted Riak solution or Riak-as-a-Service.

Second, I’d like to ask for your help in answering the question that pops into my mind everytime I’m thinking about Data-as-a-Service: leaving aside the benefits of managed services, what are the scenarios in which a Data-as-a-Service can be used when the application layer is not colocated1?

  1. A different way to formulate this question is: what apps can tolerate the WAN latency and network failures? Obviously these questions do not apply to services like Amazon Web Services or Heroku or dotCloud which offer you both Data-as-a-Service and a PaaS or IaaS. 

Original title and link: Hosted Riak With Riak-On (NoSQL database©myNoSQL)

Improvements and Benchmarks for LevelDB in Riak 1.2

Basho team started to investigate and optimize LevelDB, one of the supported storage engine for Riak and the engine for Riak 2i, and the results are already impressive:

  • reduced stalls (from 10-90s every 3-5min to 10-30s every 2h)
  • increased throughput (from 400 ops/s to 2000 ops/s)
  • a better solution for dealing with an infinite loop during compaction against a corrupted data block
  • LevelDB bloom filter for quickly identifying keys that don’t exist in the data store

The original posts also shows some charts of the throughput and maximum latency measured in Level 1.1 vs Level 1.2.

Original title and link: Improvements and Benchmarks for LevelDB in Riak 1.2 (NoSQL database©myNoSQL)


Alex Sicular's Recap of Ricon 2012, a Distributed Systems Conference for Developers

While in conference mode1 I’m like a sponge, I’m almost no good at putting all my chaotic notes in a format that is usable to anyone else.

Alex Siculars has done a great job writing down his thoughts about Basho’s fantastic Ricon 2012 and linking to his post makes me feel less guilty for not being able to post mines—I’m learning to get better for the next events:

Chatter by conference attendees left me convinced that Ricon was a success. Ricon was-well executed, well-attended and actually interesting. But more importantly, it was relevant. For those of us at the conference, we actually work in this space. We are interested in the ongoing development of distributed solutions to a number of problems. The conference delivered on creating a space that brought us together to share solutions and learn about continuing advancements. For a new conference to have a successful maiden voyage is no small feat in my book. I, for one, am looking forward to the next one.

My only contribution to Alex Sicular’s great recap is to provide some links to the talks his blog post refers to:

Joe Hellerstein: Programming Principles for a Distributed Era

The PDF can be downloaded from here

Eric Brewer: Advancing Distributed Systems

Russel Brown and Sean Cribbs: Data Structures in Riak

Bryan Fink: Riak Pipe: Distributed Processing System

Ryan Zezeski: Yokozuna: Riak + Solr

More presentation slides can be found on the official Ricon 2012 site.

  1. My thanks again to the Basho team for inviting me to Ricon 2012 and also to DataStax team for the Cassandra Summit invitation. 

Original title and link: Alex Sicular’s Recap of Ricon 2012, a Distributed Systems Conference for Developers (NoSQL database©myNoSQL)


YCSB Benchmark Results for Cassandra, HBase, MongoDB, MySQL Cluster, and Riak

Put together by the team at Altoros Systems Inc., this time run in the Amazon EC2 and including Cassandra, HBase, MongoDB, MySQL Cluster, sharded MySQL and Riak:

After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.

Teaser: HBase got the best results in most of the benchmarks (with flush turned off though). And I’m not sure the setup included the latest HBase read improvements from Facebook.

Original title and link: YCSB Benchmark Results for Cassandra, HBase, MongoDB, MySQL Cluster, and Riak (NoSQL database©myNoSQL)


Rolling With Eventual Consistency or the Pros and Cons of a Dynamo Style Key-Value Store

Great educational post by Casey Rosenthal on Basho’s blog about the radically different approach of data modelling when using non-relational storage engines or non-queryable data models.

In a previous post I wrote about the different mindset that a software engineer should have when building for a key-value database as opposed to a relational database. When working with a relational database, you describe the model first and then query the data later. With a key-value database, you focus first on what you want the result of the query to look like, and then work backward toward a model.

A different way to look at it is that the advantage of the Dynamo’s style high availability key-value store doesn’t come for free. In the world of distributed systems there’s always a trade-off and you need to carefully choose each component of the architecture to match the requirements, but also be aware of the concenssions or complexity you’ll have to accept in other parts of the system.

Original title and link: Rolling With Eventual Consistency or the Pros and Cons of a Dynamo Style Key-Value Store (NoSQL database©myNoSQL)


Using Riak as Cache Layer

Sean Cribbs explains how to use Riak as a caching solution:

  1. Bitcask or Memory backends
  2. The possibility of configuring the cluster for lower guarantees of per-key availability

Then benchmark the system for your scenario.

Original title and link: Using Riak as Cache Layer (NoSQL database©myNoSQL)


Doing Redundant Work to Speed Up Distributed Queries

Great post by Peter Bailis looking at how some systems are reducing tail latency by distributing reads across nodes:

Open-source Dynamo-style stores have different answers. Apache Cassandra originally sent reads to all replicas, but CASSANDRA-930 and CASSANDRA-982 changed this: one commenter argued that “in IO overloaded situations” it was better to send read requests only to the minimum number of replicas. By default, Cassandra now sends reads to the minimum number of replicas 90% of the time and to all replicas 10% of the time, primarily for consistency purposes. (Surprisingly, the relevant JIRA issues don’t even mention the latency impact.) LinkedIn’s Voldemort also uses a send-to-minimum strategy (and has evidently done so since it was open-sourced). In contrast, Basho Riak chooses the “true” Dynamo-style send-to-all read policy.

Original title and link: Doing Redundant Work to Speed Up Distributed Queries (NoSQL database©myNoSQL)


From MongoDB to Riak at Shareaholic

Robby Grossman talked at Boston Riak meetup about Shareaholic’s migration from MongoDB to Riak and their requirements and evaluation of top contenders: HBase, Cassandra, Riak.

Why not MongoDB?

  • working set needs to fit in memory
  • global write lock blocks all queries despite not having transactions/joins
  • standbys not “hot”

Bullet point format pros and cons for HBase, Cassandra, and Riak are in the slides.