NoSQL benchmarks: All content tagged as NoSQL benchmarks in NoSQL databases and polyglot persistence
Two posts by Oliver Meyn on measuring the performance of two HBase clusters—first results on the original cluster and results on the upgraded cluster— using
org.apache.hadoop.hbase.PerformanceEvaluation, the resulting performance charts, Ganglia charts, and some thoughts and feedback from the HBase community.
Original title and link: Performance Evaluation of HBase and How Hardware Changes Results ( ©myNoSQL)
After a very long silence (my last post about Hypertable dates back in Oct. 2010: NoSQL database architectures and Hypertable), there seems to be a bit of revival in the Hypertable space:
- there are new packages of (commercial) services (PR announcement):
- Uptime support subscription
- Training and certification
- Commercial license
- it seems like Hypertable has a customer in Rediff.com (India)
- it is taking yet another stab at HBase performance
While I’m somehow glad that Hypertable didn’t hit the deadpool, it’s quite disappointing that they are still trying to use this old and completely useless strategy of attacking another product in the market.
There are probably many marketers out there encouraging companies to use this old trick of getting attention by attacking the market leader1. And one of the simplest ways of doing that is by saying “mine is bigger than yours“.
But these days this strategy isn’t working anymore for quite a few reasons:
benchmarks are most of the time incorrect, thus the attention will be pointed in the wrong direction.
For existing users, performance issues are already known. Performance issues are also known by core developers that are always working to address them. So nothing new, just some angry users of the attacked product.
- For new users, performance is just one aspect of the decision. Most of the time, it’s one of the last considered. Community, support, adoption, and well know case studies are much more important.
Attacking competitors based on feature checklists might be slightly effective in attracting a bit of attention, but it’s not the strategy to get users and customers and grow a community.
HBase might not be a market leader, but it is definitely one of the NoSQL databases that have seen and a few very large deployments. ↩
Original title and link: Hypertable Revival. Still the wrong strategy ( ©myNoSQL)
HPCC Systems 4 nodes cluster sorts 100 gigabytes in 98 seconds and is 25% faster than a 20 nodes Hadoop cluster.
Results achieved in December 2011 show that an HPCC Systems four node Thor cluster took only 98 seconds to complete a Terasort with a job size of 100 gigabytes (GB) on a cluster five times smaller than Hadoop. The HPCC Systems four node cluster was comprised of one (1) Dell PowerEdge C6100 2U server with Intel® Xeon® processors E5675 series, 48GB of memory, and 6 x 146GB SAS HDD’s. The Dell C6100 houses four nodes inside the 2U enclosure. The previous leader ran the same Terasort benchmark in 130 seconds on a 20-node Hadoop cluster using equivalent node hardware. HPCC Systems is an Open Source, enterprise-proven Big Data analytics-processing platform.
Thus Armando Escalante (SVP and CTO of LexisNexis Risk Solutions and head of HPCC Systems) concludes:
These results demonstrate that HPCC Systems is a leader in Big Data processing
Now switching to a post on MapR’s blog:
Recently a world record was claimed for a Hadoop benchmark. […] We were surprised to see that this world record was for a TeraSort benchmark on a 100GB of data. TeraSort is a standard benchmark and the name is derived from “sorting a terabyte”. Any record claims for sorting a 100GB dataset across a 20 node cluster with 10 times as much memory is comical. The test is named TeraSort not GigaSort.
Original title and link: Hadoop, HPCC, MapR and the TeraSort Benchmark ( ©myNoSQL)
Though it looks like mongo-store demonstrates the best overall performance, it should be noted that a mongo server is unlikely to be used solely for caching (the same applies to redis), it is likely that non-caching related queries will be running concurrently on a mongo/redis server which could affect the suitability of these benchkmarks.
I’m not a Rails user, so please take these with a grain of salt:
without knowing the size of the cached objects, at 20000 iterations most probably neither MongoDB, nor Redis have had to persist to disk.
This means that all three of memcached, MongoDB, Redis stored data in memory only
if no custom object serialization is used by any of the memcached, MongoDB, Redis caches, then the performance difference is mostly caused by the performance of the driver
it should not be a surprise to anyone that the size of the cached objects can and will influence the results of such benchmarks
there doesn’t seem to be any concurrent access to caches. Concurrent access and concurrent updates of caches are real-life scenarios and not including them in a benchmark greatly reduces the value of the results
none of these benchmarks doesn’t seem to contain code that measure the performance of cache eviction
Except the case where any of these forces a disk write ↩
Original title and link: Rails Caching Benchmarked: MongoDB, Redis, Memcached ( ©myNoSQL)