8 6 reasons Adku prefers Cassandra to HBase:
Single point of failure
- Hot spot problem
- Simpler, Hackable
- Community support
Before jumping to any conclusions make sure you read the disclaimer:
While these decisions apply to Adku, they might not apply to your situation. Always do your own investigation and experimentation before choosing any large part of your system.
Update: JD Cryans2 commented on the points listed above (thanks JD):
This comparison reminds me of the pain we went through in the late 2009 when lots of similar comparisons came out from all sides — the “NoSQL war”. Unfortunately as we all found out, no one wins.
But let’s look at the points mentioned in this post.
Reliability: As far as I can tell that’s not a reliability test. The first thing that raises questions is the large number of crashes of the region servers. Considering the data set used (1 million rows of the full “Alice in Wonderland” text) is small compared to the ones other HBase users (StumbleUpon, Mozilla) are handling, that would point out to a configuration problem that wasn’t taken care of.
One could say it’s because HBase is hard to configure or
that the default configurations aren’t good, and to some extent I
agree, but you don’t quantify reliability based on these.
Hot Spot Problem: This point is an interesting one, and more likely falls into the disclaimer.
Distribution based on timestamp row keys will be better with Cassandra. But usually when using timestamps you also want range scans which is impossible with hashing. For example OpenTSDB provides a very efficient way to store time series by using a clever row key design. A design that you’ll probably also use if you need scans in Cassandra.
Not to mention that using MapReduce will require sorted row keys anyways.
Community Support: Comparing communities only based on the number of IRC users is too much of a simplification. Someone looking to use an open source project should spend some time getting to know and interact with the users before stating that “one community is more helpful” than the other — a message that could also be perceived as disrespectful.
There are also a couple of points that are mentioned in the post even if HBase is the “winner” (MapReduce) or the feature is not a hard requirement (consistency).
I left performance last as the post mentions similar write performance results. But there is too little information about the benchmark to be able to comment on it. At first glance those results look weird considering they
weren’t using a Hadoop version that supports append, which as shown by the original YCSB paper would make quite a difference.
After the Adku blog came out, Edward Capriolo wrote this
response (rant?) to all who try to do the same as them and I think
it’s worth the read: http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/myths_rumors_fud_hate_nosql
Original title and link: Adku’s Choice: Cassandra or HBase (NoSQL databases © myNoSQL)