voltdb: All content tagged as voltdb in NoSQL databases and polyglot persistence
A paper on integrating VoltDB and Hadoop. From what I read, for now it works on a single direction (exporting data from VoltDB to Hadoop):
It is possible to design and develop a complete business solution utilizing both VoltDB and Hadoop from scratch. But you do not need to. VoltDB simplifies the process by providing an export facility that lets you automatically archive selected data from the VoltDB database. And you can use this export functionality with Hadoop.
See the paper below:
Some say it is the right time to start having these around. Others are saying it’s way to early to start the “battle”. Users do want to see them and in case they’re lacking they create their own, most of the time using incomplete or wrong approaches.
But what am I talking about? As some of you might have guessed already:
But users are more interested in seeing cross product benchmarks, even if most of the time constructing these is extremely complicated and they end up comparing apples with oranges.
All these being said and accepting that most of the time someone will figure out a way to invalidate the results, lets see what cross product benchmarks do we have in the NoSQL space.
Yahoo! Cloud Serving Benchmark
The Yahoo! Cloud Serving Benchmark’s goal is to facilitate performance comparisons of the new generation of cloud data serving systems. The source code is available on ☞ GitHub and Yahoo! has also published ☞ the results of running this benchmark against Cassandra, HBase, Yahoo!’s PNUTS, and a simple sharded MySQL implementation.
VoltDB a new storage solution that calls itself the next-generation SQL RDBMS with ACID for fast-scaling OLTP applications has recently ☞ published the results of their benchmark comparing VoltDB and Cassandra.
It is worth noting that while being one of those apples to oranges comparisons (nb and the authors are well aware of it), there are still a couple of interesting and useful things to be learned from it (i.e. benchmarking procedure, tested scenarios, etc.)
Unfortunately at this time the source code is not yet available, but hopefully we will see it soon:
Going forward, we’re planning to release the code we used to do these benchmarks. We’d also like to try a few other storage layers
Hypertable and HBase Performance Evaluation
The guys behind Hypertable ☞ have published their results of comparing Hypertable with HBase using a benchmark based on the Google BigTable paper from which both HBase and Hypertable are inheriting their architecture.
Unfortunately, the benchmark code is not available at this moment.
So, as far as I could gather we have:
- ☞ Riak internal benchmark
- ☞ MongoDB internal benchmark
- ☞ Yahoo! Cloud Serving Benchmark
- results only of VoltDB Benchmark comparing VoltDB and Cassandra
- BigTable-inspired benchmark comparing Hypertable and HBase
Did I miss any?
Interesting to note that some VoltDB don’ts from the paper ☞ Do’s and Don’ts (pdf) are validating some major assumptions in the NoSQL space:
Don’t create tables with very large rows (that is, lots of columns or large VARCHAR columns). Several smaller tables with a common partitioning key are better.
Basically both wide-column stores (i.e. Cassandra, HBase, Hypertable) with their column-families and document databases (i.e. CouchDB, MongoDB, RavenDB, Terrastore) with their schema-less approach are addressing this issue.
- Don’t use ad hoc SQL queries as part of a production application.
Firstly this points to the mindset change required by the NoSQL space when doing data modeling: think about data access patterns.
Secondly, it pretty much validates CouchDB and RavenDB approaches of having queries defined upfront making their reads extremely fast.