NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Bitcask: All content tagged as Bitcask in NoSQL databases and polyglot persistence

Riak Getting LevelDB as Storage Engine

After Innostore and Bitcask, Basho guys are currently experimenting with integrating Google’s LevelDB as a storage engine for Riak. Preliminary results are looking promising:

For most Riak users, Bitcask is the obvious right storage engine to use. It provides low latency, solid predictability, is robust in the face of crashes, and is friendly from a filesystem backup point of view. However, it has one notable limitation: total RAM use depends linearly (though via a small constant) on the total number of objects stored. For this reason, Riak users that need to store billions of entries per machine sometimes use Innostore (our wrapper around embedded InnoDB) as their storage engine instead. InnoDB is a robust and well-known storage engine, and uses a more traditional design than Bitcask which allows it to tolerate a higher maximum number of items stored on a given host.

It appears that LevelDB may become a preferred choice for Riak users whose data set has massive numbers of keys and therefore is a poor match with Bitcask’s model. Performance aside, it compares favorably to InnoDB on other issues such as permissive license and operational usability.

Original title and link: Riak Getting LevelDB as Storage Engine (NoSQL database©myNoSQL)


Riak, Bitcask, Innostore and The Impact of Key Distribution

An interesting finding from Kresten Krab Thorup[1] on how key distribution is impacting performance:

Innostore uses a B-tree, and we realized that it was really suffering from the random keys, because it then needs to do I/O on random nodes of the B-tree.

So we changed the keys to be <<timestamp>>:<<random-bits> i.e., such that successive writes have keys that are lexicographically close. The random bits are there to make the chance of conflict small enough.

Using such keys cause the underlying B-tree to only writes to a few nodes at a time, and ideally innostore only needs to keep tree-nodes in memory corresponding to a path from the root of the tree to the node currently being added to.

  1. Kresten Krab Thorup: Programmer, Entrepreneur, Programmer, Scientist, Programmer, CTO at Trifork  

Original title and link: Riak, Bitcask, Innostore and The Impact of Key Distribution (NoSQL databases © myNoSQL)


Storing Part of Riak Object Value in Memory

The disadvantage of Riak’s Key Filter approach is that you end up with highly domain-specific keys, which can be hard to reference, especially if you need to update keys to allow querying new aspects of the data: If you need to change your existing keys, references to these keys needs to be updated too. This is hard to do atomically when you have a key-value store like Riak. Even worse, if data changes you need to update the key, and – again – the pointers to the key, if you have any.

I was convinced that natural keys were always domain specific.

What if… not only the key, but also part of the object’s value could be stored in memory? Then you could write queries that used the object’s memory only and get good performance.

You’ll probably get a document database instead of a key-value store. Next you’ll call it CouchDB or MongoDB instead of Riak. Like this guy.

Original title and link: Storing Part of Riak Object Value in Memory (NoSQL databases © myNoSQL)


Riak Bitcask Explained

From the category “if you didn’t read the paper1, here is a good summary of it”, Todd Hoff’s notes about Riak’s Bitcask.

Eric Brewer (CAP theorem) came up with idea with Bitcask by considering if you have the capacity to keep all keys in memory, which is quite likely on modern systems, you can have a relatively easy to design and implement storage system. The commit log can be used as the database itself, providing atomicity and durability. Only one write is required to persist the data. Separate writes to a data file and a commit log is not necessary.

Since Riak 0.11.0, Bitcask is the default storage engine.

Original title and link: Riak Bitcask Explained (NoSQL databases © myNoSQL)


Krati: A Persistent High-Performance Data Store

Krati is a simple persistent data store with very low latency and high throughput. It is designed for easy integration with read-write-intensive applications with little effort in tuning configuration, performance and JVM garbage collection.

Sounds a bit like Bitcask. Anyone can point out at least on the major differences?

From the project page:

  • supports varying-length data array
  • supports key-value data store access
  • performs append-only writes in batches
  • has write-ahead redo logs and periodic checkpointing
  • has automatic data compaction (i.e. garbage collection)
  • is memory-resident (or OS page cache resident) yet persistent
  • allows single-writer and multiple readers

Krati: A Persistent High-Performance Data Store originally posted on the NoSQL blog: myNoSQL


From Cassandra to Riak at

A couple of confusing things in this post:

The nice thing about Cassandra was the data model. Super columns allowed us to store metadata for a resource as needed. […] Concurrency issues were also not a bother. We could do simultaneous updates to columns and super columns and not worry about data consistency issues. […] When looking for alternatives Riak was our first choice primarily because of it being in Erlang and since it had a map-reduce option which looked seriously promising.

I don’t see any connection between these. Going from a granular data model supporting column level operations to an key-value store with opaque values doesn’t really add up.

Of the back-ends available this has worked best for us giving a consistent performance along with being reasonable on the resource usage.

This seems a bit contradictory with what was said about the new default Riak storage Bitcask in the Innostore and Bitcask comparison.

Anyone able to clarify these? (nb I’m not saying something is wrong, but I’d like to better understand the details). For now, Mozilla story Cassandra, HBase, Riak: Choosing the Right Solution seems to be much better documented.

Update: Thanks to Jebu Ittiachen things are a clearer now:

My issues with Cassandra and with Bitcask under Riak were with how they behaved in terms of their memory consumption. In the presence of ever increasing number of keys like the tweets which keep coming in both of them would eat up all the memory available on my servers. Cassandra I guess because of its per SSTable cache of keys and Bitcask because it maintains all keys in memory. This initially being the reason for me looking out for a different store than Cassandra. I should mention that in addition to tweets other data is also managed in Cassandra / Riak.

What I was trying to convey is how something that was easily modeled in Cassandra could still be mapped into Riak and possibly be to an advantage given the map-reduce infrastructure.

My preference of innostore over bitcask has purely been seeing how they behave in real use. Bitcask is definitely faster but high in memory usage on the servers. Innostore on the other hand is steady on the memory usage over time.

From Cassandra to Riak at originally posted on the NoSQL blog: myNoSQL


Details About Riak Innostore and Bitcask Backends

With the recent release 0.11.0 Riak switched the default backend storage from using embedded Innostore to Bitcask.

Andy Gross and johne had a very interesting conversation about the differences between Innostore and Bitcask Riak backend stores:

innostore currently creates a file per bucket/partition combo but all other backends use one file per partition unless you really want innostore, we recommend you use bitcask one other thing with buckets: buckets dont consume any resources as long as they use the bucket defaults - either the stock riak defaults or ones you set in your app.config buckets that change some of those defaults take up a small amount of space in the ring data structure that’s gossiped around

Details About Riak Innostore and Bitcask Backends originally posted on the NoSQL blog: myNoSQL


Release: Riak 0.11.0, Defaults to Bitcask storage

Basho team has ☞ announced the release of Riak 0.11.0 which features a couple enhancements and bug fixes. But more importantly the new Riak 0.11.0 is using in-house developed Bitcask storage so replacing the embedded InnoDB store and other previously available options.

As a side note, Chef cookbooks for Riak have also been ☞ updated and Basho also released their internal ☞ benchmark code.

Bitcask has been ☞ announced a while ago as a solution developed to address the following goals:

  • low latency per item read or written
  • high throughput, especially when writing an incoming stream of random items
  • ability to handle datasets much larger than RAM w/o degradation
  • crash friendliness, both in terms of fast recovery and not losing data
  • ease of backup and restore
  • a relatively simple, understandable (and thus supportable) code structure and data format
  • predictable behavior under heavy access load or large volume
  • a license that allowed for easy default use in Riak

Jeff Darcy has some ☞ very good things to say about Bitcask, so I’ve spent some time reading the ☞ technical paper (pdf). While not an expert in either Bitcask or BerkleyDB/Java I have found the top level goals and some of the implementation details quite similar, but I’m pretty sure there are some subtle differences as BerkleyDB is referred to in the paper (nb maybe it was just about the license).

Release: Riak 0.11.0, Defaults to Bitcask storage originally posted on the NoSQL blog: myNoSQL