Hypertable: All content tagged as Hypertable in NoSQL databases and polyglot persistence
Thursday, 29 July 2010
Quick Dive into Hypertable Thrift API
I like the parallels with notions from the MySQL world:
[…] let’s take a look at high performance reading using Scanner. To those who are familiar with MySQL, the concept of using scanner is quite similar to the SSCursor. Instead of reading all the records into client side memory, there is a server-side cursor that’s “streaming” the result set to client side.
via: http://notes.alexdong.com/quick-introduction-to-hypertables-thrift-api
Tuesday, 29 June 2010
Who gives lowest read latency? Cassandra, HBase, Hypertable, or Voldemort?
Interesting question on Hacker News with good/informed comments so far.
I’ve got a great deal of information that I need to store in a key value format. I need access to that data as quickly as possible. Writes are only going to occur quarterly. Any thoughts?
Friday, 25 June 2010
NoSQL benchmarks and performance evaluations
Some say it is the right time to start having these around. Others are saying it’s way to early to start the “battle”. Users do want to see them and in case they’re lacking they create their own, most of the time using incomplete or wrong approaches.
But what am I talking about? As some of you might have guessed already:
NoSQL benchmarks and performance evaluations!
With their recent release of Riak 0.11.0, Basho guys have also published their internal ☞ benchmarking code. Similar internal benchmark code is ☞ available for MongoDB.
But users are more interested in seeing cross product benchmarks, even if most of the time constructing these is extremely complicated and they end up comparing apples with oranges.
All these being said and accepting that most of the time someone will figure out a way to invalidate the results, lets see what cross product benchmarks do we have in the NoSQL space.
Yahoo! Cloud Serving Benchmark
The Yahoo! Cloud Serving Benchmark’s goal is to facilitate performance comparisons of the new generation of cloud data serving systems. The source code is available on ☞ GitHub and Yahoo! has also published ☞ the results of running this benchmark against Cassandra, HBase, Yahoo!’s PNUTS, and a simple sharded MySQL implementation.
VoltDB Benchmark
VoltDB a new storage solution that calls itself the next-generation SQL RDBMS with ACID for fast-scaling OLTP applications has recently ☞ published the results of their benchmark comparing VoltDB and Cassandra.
It is worth noting that while being one of those apples to oranges comparisons (nb and the authors are well aware of it), there are still a couple of interesting and useful things to be learned from it (i.e. benchmarking procedure, tested scenarios, etc.)
Unfortunately at this time the source code is not yet available, but hopefully we will see it soon:
Going forward, we’re planning to release the code we used to do these benchmarks. We’d also like to try a few other storage layers
Hypertable and HBase Performance Evaluation
The guys behind Hypertable ☞ have published their results of comparing Hypertable with HBase using a benchmark based on the Google BigTable paper[1] from which both HBase and Hypertable are inheriting their architecture. Unfortunately, the benchmark code is not available at this moment.
Thanks to Stu Hood, now I know the code for this benchmark is available in the Hypertable distribution available ☞ here (tar.gz) and the configuration files are also available ☞ here (tar.gz)
So, as far as I could gather we have:
- ☞ Riak internal benchmark
- ☞ MongoDB internal benchmark
- ☞ Yahoo! Cloud Serving Benchmark
- results only of VoltDB Benchmark comparing VoltDB and Cassandra
- BigTable-inspired benchmark comparing Hypertable and HBase
Did I miss any?
More Integrations for Hive
Hive is data warehouse infrastructure built on top of Hadoop offering tools for data ETL, a mechanism to put structures on the data, and the capability to querying and analyzing large data sets stored in Hadoop[1]. To better understand the benefits of Hive you can check how Facebook is using Hive to deal with petabyte scale data warehouse.
Recently, John Sichi a member of the Data infrastructure team at Facebook published an article on integrating Hive and HBase. Also there is interest in having Hive work with Cassandra and this is ☞ tracked in Cassandra JIRA (nb: not sure there’s any advance on this yet though).
Hypertable, another wide-column store, provides a way to integrating with Hive described ☞ here:
Hypertable-Hive integration allows Hive QL statements to read and write to Hypertable via SELECT and INSERT commands. […] Currently the Hypertable storage handler only supports external, non-native tables.
Somehow all this work to provide a common data warehouse infrastructure on top of existing NoSQL solutions (or at least the wide-column stores which are focused on large scale datasets) seems to confirm there’s no need for a common NoSQL language.
- From ☞ Hive wiki page (↩)
Tuesday, 1 June 2010
NoSQL Ecosystem News & Links 2010-06-01
- Richard Boulton: ☞ Using Redis as a backend for Xapian. An interesting analysis of how a dedicated search engine would work with a Redis backend. Meanwhile others try to simply store the reverted index into Redis¶
- Paul Rosania: ☞ Point-and-Click install of MongoDB on OS X 10.5+. Not that it was difficult before, but nice to have! ¶
- Doug Judd: ☞ Why We Started Hypertable, Inc. … or welcome to the Hypertable Inc. blog. ¶
- Surya Surabarapu: ☞ Terrastore Scala Client. First Terrastore library in our NoSQL libraries list ¶
Wednesday, 14 April 2010
Performance tests for Hypertable
Finding news about Hypertable seems to be pretty difficult, so I thought I should share this one even if it is not quite fresh nor extremely detailed.
Unfortunately it’s difficult to say from the slides what was the tested scenarios. If you have more details please share them with us so I can update the post.
Thursday, 24 December 2009
NoSQL panel for OSCON 2010 Taking Shape
It looks like OSCON 2010 will host an extremely interesting NoSQL panel whose guests will be:
- Doug Judd: Hypertable
- Emil Eifrém: Neo4j
- Jonathan Ellis: Cassandra
- Justin Sheehy: Riak
- J.Chris Anderson:CouchDB
- K.S. Bhaskar: GT.M
Now if only I can make it there or, even better, to be a moderator of the panel.
via: http://groups.google.com/group/nosql-discussion/browse_thread/thread/2ec92eb6e8688871?pli=1
Tuesday, 15 December 2009
Cassandra Winning the NoSQL Race… Is It Really?
Tony Bain was probably ☞ tricked to think so based on news that Cassandra is used by Digg [1], Twitter [2] etc. To me those are just signs that:
- Cassandra has finally gathered a community behind it [3]
- they have identified good or common use cases
Secondly, the NoSQL world is quite wide. Cassandra is a column-oriented store (in the same category: BigTable, Hypertable, HBase), but we also have key-value stores, document stores, graph stores — see [4], [5] and [6] for more details — so saying that it is winning the race is incorrect. So, at best it should be compared with the other column-oriented solution.
Thinking of HBase, we recently learnt [7] that is doing well too, that there are real-life production applications running on it, and that it has seen good performance improvements over the last couple of releases. And as far as I know there is a larger community behind it.
You should also check the HBase vs. Cassandra: NoSQL Battle! article to better understand how they compare and where they differ and also Cassandra Gets (Better) Documentation for some very good references.
References
- ☞ Is Cassandra winning the NoSQL race?
- [1] ☞ Looking to the future with Cassandra (Digg)
- [2] ☞ up and running with cassandra (Twitter)
- [3] Cassandra Gets (Better) Documentation
- [4] ☞ NoSQL Ecosystem
- [5] ☞ Quick Reference to Alternative data storages
- [6] Musings on NoSQL
- [7] NoSQL with HBase
- HBase vs. Cassandra: NoSQL Battle!
Friday, 4 December 2009
No Relation: The Mixed Blessings of Non-Relational Databases
A paper by Ian Thomas Varley, M.S.E. covering the following aspects of non-relational databases:
- use cases
- pros and cons
- design strategies
The paper in PDF format can be downloaded from ☞ here
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling