NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



riak case study: All content tagged as riak case study in NoSQL databases and polyglot persistence

What is Riak?

From Basho’s blog:

Riak is:

  • A Database
  • A Data Store
  • A key/value store
  • Used by Fortune 100 Companies
  • Used by startups
  • A “NoSQL” database
  • Schemaless and data-type agnostic
  • Written (primarily) in Erlang
  • As distributed as you want and need it to be
  • Scalable
  • Pronounced “REE-ack”
  • Not the best fit for every project and application
  • And much, much more…

So far I’ve heard only about Riak and Mozilla, Riak at, and this atypical Riak usage for a church kiosks, but no mentions of Fortune 100 company names. Anyone knows who are they referring to?

Update: please check the comment thread for more details. It looks like the Fortune 100 company Basho is referring to is Comcast.

Original title and link for this post: What is Riak? (published on the NoSQL blog: myNoSQL)


Extensive Riak Benchmarking at Mozilla Test Pilot

Mozilla has previously published about their detailed plan and extensive investigation into Cassandra, HBase, and Riak that led to choosing Riak. This time they are publishing some extensive Riak benchmark results (against both Riak 0.10 and Riak 0.11 running Bitcask) — they are using Riak benchmarking code, included in the list of correct NoSQL benchmarks and performance evaluations solutions. Both the results, their analysis , and interpretation are fascinating.

Our goal in running these studies was, simply put, no surprises. That meant we needed to run studies to that profiled:

  1. Latency
  2. Stability, especially for long running tests
  3. Performance when we introduced variable object sizes
  4. Performance when we introduced pre-commit hooks to evaluate incoming data

I guess Mozilla Test Pilot is one of the Riak’s most interesting case studies.

Original title and link for this post: Extensive Riak Benchmarking at Mozilla Test Pilot (published on the NoSQL blog: myNoSQL)

From Cassandra to Riak at

A couple of confusing things in this post:

The nice thing about Cassandra was the data model. Super columns allowed us to store metadata for a resource as needed. […] Concurrency issues were also not a bother. We could do simultaneous updates to columns and super columns and not worry about data consistency issues. […] When looking for alternatives Riak was our first choice primarily because of it being in Erlang and since it had a map-reduce option which looked seriously promising.

I don’t see any connection between these. Going from a granular data model supporting column level operations to an key-value store with opaque values doesn’t really add up.

Of the back-ends available this has worked best for us giving a consistent performance along with being reasonable on the resource usage.

This seems a bit contradictory with what was said about the new default Riak storage Bitcask in the Innostore and Bitcask comparison.

Anyone able to clarify these? (nb I’m not saying something is wrong, but I’d like to better understand the details). For now, Mozilla story Cassandra, HBase, Riak: Choosing the Right Solution seems to be much better documented.

Update: Thanks to Jebu Ittiachen things are a clearer now:

My issues with Cassandra and with Bitcask under Riak were with how they behaved in terms of their memory consumption. In the presence of ever increasing number of keys like the tweets which keep coming in both of them would eat up all the memory available on my servers. Cassandra I guess because of its per SSTable cache of keys and Bitcask because it maintains all keys in memory. This initially being the reason for me looking out for a different store than Cassandra. I should mention that in addition to tweets other data is also managed in Cassandra / Riak.

What I was trying to convey is how something that was easily modeled in Cassandra could still be mapped into Riak and possibly be to an advantage given the map-reduce infrastructure.

My preference of innostore over bitcask has purely been seeing how they behave in real use. Bitcask is definitely faster but high in memory usage on the servers. Innostore on the other hand is steady on the memory usage over time.

From Cassandra to Riak at originally posted on the NoSQL blog: myNoSQL


Riak in Production: An Atypical Story

A non-enterprisey and non-twitteresque, but very interesting Riak deployment on a church’s kiosks:

Currently, we are running four Riak nodes (writing out to the Filesystem backend) outside of the three Kiosks themselves. I also have various Riak nodes on my random linux servers because I can use the CPU cycles on my other nodes to distribute MapReduce functions and store information in a redundant fashion.

Please note also the reduced complexity of bringing new kiosks up:

As I bring more kiosks into operation, the distributed map-reduce feature is becoming more valuable. Since I typically run reports during off hours, the kiosks aren’t overloaded by the extra processing power. So far I have been able to roll out a new kiosk within 2 hours of receiving the hardware. Most of this time is spent doing the installation and configuration of the touchscreen.

Atypical I’d say, but definitely exciting!