NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Riak, Bitcask, Innostore and The Impact of Key Distribution

An interesting finding from Kresten Krab Thorup[1] on how key distribution is impacting performance:

Innostore uses a B-tree, and we realized that it was really suffering from the random keys, because it then needs to do I/O on random nodes of the B-tree.

So we changed the keys to be <<timestamp>>:<<random-bits> i.e., such that successive writes have keys that are lexicographically close. The random bits are there to make the chance of conflict small enough.

Using such keys cause the underlying B-tree to only writes to a few nodes at a time, and ideally innostore only needs to keep tree-nodes in memory corresponding to a path from the root of the tree to the node currently being added to.

  1. Kresten Krab Thorup: Programmer, Entrepreneur, Programmer, Scientist, Programmer, CTO at Trifork  

Original title and link: Riak, Bitcask, Innostore and The Impact of Key Distribution (NoSQL databases © myNoSQL)