NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



From Cassandra to Riak at

A couple of confusing things in this post:

The nice thing about Cassandra was the data model. Super columns allowed us to store metadata for a resource as needed. […] Concurrency issues were also not a bother. We could do simultaneous updates to columns and super columns and not worry about data consistency issues. […] When looking for alternatives Riak was our first choice primarily because of it being in Erlang and since it had a map-reduce option which looked seriously promising.

I don’t see any connection between these. Going from a granular data model supporting column level operations to an key-value store with opaque values doesn’t really add up.

Of the back-ends available this has worked best for us giving a consistent performance along with being reasonable on the resource usage.

This seems a bit contradictory with what was said about the new default Riak storage Bitcask in the Innostore and Bitcask comparison.

Anyone able to clarify these? (nb I’m not saying something is wrong, but I’d like to better understand the details). For now, Mozilla story Cassandra, HBase, Riak: Choosing the Right Solution seems to be much better documented.

Update: Thanks to Jebu Ittiachen things are a clearer now:

My issues with Cassandra and with Bitcask under Riak were with how they behaved in terms of their memory consumption. In the presence of ever increasing number of keys like the tweets which keep coming in both of them would eat up all the memory available on my servers. Cassandra I guess because of its per SSTable cache of keys and Bitcask because it maintains all keys in memory. This initially being the reason for me looking out for a different store than Cassandra. I should mention that in addition to tweets other data is also managed in Cassandra / Riak.

What I was trying to convey is how something that was easily modeled in Cassandra could still be mapped into Riak and possibly be to an advantage given the map-reduce infrastructure.

My preference of innostore over bitcask has purely been seeing how they behave in real use. Bitcask is definitely faster but high in memory usage on the servers. Innostore on the other hand is steady on the memory usage over time.

From Cassandra to Riak at originally posted on the NoSQL blog: myNoSQL