ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Membase Amazon SimpleDB MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

hypertable: All content tagged as hypertable in NoSQL databases and polyglot persistence

Hypertable Revival. Still the wrong strategy

After a very long silence (my last post about Hypertable dates back in Oct. 2010: NoSQL database architectures and Hypertable), there seems to be a bit of revival in the Hypertable space:

  1. there are new packages of (commercial) services (PR announcement):
    1. Uptime support subscription
    2. Training and certification
    3. Commercial license
  2. it seems like Hypertable has a customer in Rediff.com (India)
  3. it is taking yet another stab at HBase performance

While I’m somehow glad that Hypertable didn’t hit the deadpool, it’s quite disappointing that they are still trying to use this old and completely useless strategy of attacking another product in the market.

There are probably many marketers out there encouraging companies to use this old trick of getting attention by attacking the market leader1. And one of the simplest ways of doing that is by saying “mine is bigger than yours“.

But these days this strategy isn’t working anymore for quite a few reasons:

  1. benchmarks are most of the time incorrect, thus the attention will be pointed in the wrong direction.

    In the case of the Hypertable vs HBase benchmark, JD Cryans (HBase veteran) is demoting the results.

  2. For existing users, performance issues are already known. Performance issues are also known by core developers that are always working to address them. So nothing new, just some angry users of the attacked product.

  3. For new users, performance is just one aspect of the decision. Most of the time, it’s one of the last considered. Community, support, adoption, and well know case studies are much more important.

Attacking competitors based on feature checklists might be slightly effective in attracting a bit of attention, but it’s not the strategy to get users and customers and grow a community.


  1. HBase might not be a market leader, but it is definitely one of the NoSQL databases that have seen and a few very large deployments. 

Original title and link: Hypertable Revival. Still the wrong strategy (NoSQL database©myNoSQL)


NoSQL Databases: What, Why, and When

Lorenzo Alberton with an overview of the NoSQL landscape:

NoSQL databases get a lot of press coverage, but there seems to be a lot of confusion surrounding them, as in which situations they work better than a Relational Database, and how to choose one over another. This talk will give an overview of the NoSQL landscape and a classification for the different architectural categories, clarifying the base concepts and the terminology, and will provide a comparison of the features, the strengths and the drawbacks of the most popular projects (CouchDB, MongoDB, Riak, Redis, Membase, Neo4j, Cassandra, HBase, Hypertable).


Where Riak Fits? Riak’s Sweetspot

Martin Schneider (Basho) trying to answer the question in the title:

Riak can be a data store to a purpose-built enterprise app; a caching layer for an Internet app, or part of the distributed fabric and DNA of a Global app. Those are of course highly arbitrary and vague examples, but it shows how flexible Riak is as a platform.

“Can be” is not quite equivalent with being the right solution and less so with being the best solution. And Martin’s answer to this is:

For super scalable enterprise and global apps — those where the data inside is inherently valuable and dependability of the system to capture, process and store data/writes is imperative — well I see Riak outperforming any perceived competitor in the space in providing value here.

But even for these scenarios, there’s competition from solutions like Cassandra, HBase, and Hypertable — the whole spectrum of scalable storage solutions based on Google BigTable and Amazon Dynamo being covered: HBase (a BigTable implementation), Cassandra (a solution using the BigTable data model and the Dynamo distributed model), and Riak (a solution based mainly on the Amazon Dynamo paper).

While Riak presents itself as the cleanest Dynamo based solution, I would venture to say that both Cassandra and HBase come to table with some interesting characteristics that cannot be ignored:

  1. Strong communities and community driven development processes — both HBase and Cassandra are top Apache Foundation projects
  2. Excellent integration with Hadoop, the leading batch processing solution. DataStax, the company offering services for Cassandra, went the extra-mile of creating a custom Hadoop solution, Brisk, making this integration even better.

Bottom line, I don’t think we can declare a winner in this space and I believe all three solutions will stay around for a while competing for every scenario requiring dependability of the system to capture, process and store data.

Original title and link: Where Riak Fits? Riak’s Sweetspot (NoSQL databases © myNoSQL)


Cloudata: New Open Source BigTable Implementation

Cloudata is the third open source implementation of Google’s BigTable paper, after HBase and Hypertable[1]. There’s already an 1.0 version even if the Github project page is listing just a couple of commits.

From the home page, Cloudata’s current features:

  • Basic data service
    • Single row operation(get, put)
    • Multi row operation(like, between, scanner)
    • Data uploader(DirectUploader)
    • MapReduce(TabletInputFormat)
    • Simple cloudata query and supports JDBC driver
  • Table Management
    • split
    • distribution
    • compaction
  • Utility
    • Web based Monitor
    • CLI Shell
  • Failover
    • Master failover
    • TabletServer failover
  • Change log Server
    • Reliable fast appendable change log server
  • Support language
    • Java, RESTful API, Thrift

I couldn’t figure out if this is just an experiment or if it actually plans to be a real project.

Update: Cloudata’s author, Jsjangg, mentions in the comment thread that Cloudata is used at www.searcus.com for 2 years already running on a 20 machine cluster.


  1. See why I haven’t included Cassandra in this list in the comment thread.  

Original title and link: Cloudata: New Open Source BigTable Implementation (NoSQL databases © myNoSQL)


6 Criteria for Real Column Stores

Michael Stonebraker has published on Vertica blog an article presenting 6 criteria for characterizing the completeness of a column store implementation:

I/O Characteristics

  • IO-1 (basic column store): Every storage block contains data from only ONE column.
  • IO-2: Aggressive compression
  • IO-3: No record-ids

CPU Characteristics

  • CPU-4: A column executor
  • CPU-5: Executor runs on compressed data
  • CPU-6: Executor can process columns that are key sequence or entry sequence

Michael’s post is going after big fishes in the ocean (SybaseIQ, EMC Greenplum, Aster Data, Oracle) and in case this is the area that interests you, you should also check Curt Monash’s follow up.

But getting back to these 6 criteria for column stores, I confess that this time these seem to make a lot of sense. So, I’m wondering how NoSQL column-stores — Cassandra, HBase, and Hypertable — are doing from this perspective. I’d really appreciate some expert comments so we have a follow up with the status of NoSQL column-stores according to these criteria.

Update: Alex Feinberg pointed me to Daniel Abadi’s article that clarifies the distinction between solutions Michael’s post is mentioning and the new NoSQL column stores.

While not remembering exactly this article, I’ve continued to maintain this separation and my post’s intention is to make sure the separation is kept, but also to get experts feedback on the following questions:

  • do any of these criteria apply to NoSQL column stores?
  • if a criterion applies than how NoSQL column stores score at it?
  • if a criterion doesn’t apply, why doesn’t it apply?

Original title and link: 6 Criteria for Real Column Stores (NoSQL databases © myNoSQL)


NoSQL Database Architectures & Hypertable

In the series of NoSQL videos for the weekend, today we have Doug Judd’s presentation from October’s HackerDojo on NoSQL database architecture and Hypertable.

Original title and link: NoSQL Database Architectures & Hypertable (NoSQL databases © myNoSQL)


NoSQL Frankfurt: A Quick Review of the Conference

Yesterday was the NoSQL Frankfurt conference and today we have the chance to review some of the slide decks presented.

Beyond NoSQL with MarkLogic and The Universal Index

Nuno Job (@dscape) has presented on MarkLogic — an XML server we haven’t talked too much about, its universal index, and a couple of other interesting features.

The GraphDB Landscape and sones

Achim Friedland (@ahzf) has provided a very interesting overview of the graph databases products, the goals and some scenarios for graph databases, a brief comparison of property graphs with other models (relational databases, object-oriented, semantic web/RDF, and many other interesting aspects.

Data Modeling with Cassandra Column Families

Gary Dusbabek (@gdusbabek) has covered data modeling with Cassandra (the topic I’m still finding to be one of the most complicated).

Neo4j Spatial - GIS for the rest of us

Peter Neubauer (@peterneubauer) covered another interesting topic in the data space: geographic information (GIS) in graph databases.

Even if GISers suggested this integration some time ago Neo4j announced recently support for GEO.

Cassandra vs Redis

Tim Lossen (@tlossen) slides compare Cassandra and Redis from the perspective of a Facebook game requirements. All I can say is that the conclusion is definitely interesting, but you’ll have to check the slides by yourselves.

Mastering Massive Data Volumes with Hypertable

Doug Judd — who impressed me with his fantastic Hypertable: The Ultimate Scaling Machine at the Berlin Buzzwords NoSQL conference — gave a talk on Hypertable, its architecture and performance. The presentation also mentioned two Hypertable case studies: Zvents (an analytics platform) and Reddiff.com (spam classification)[1]:

More presentations will be added as I’m receiving them.


  1. Just recently I’ve posted about Hadoop being used for spam detection.  ()

Original title and link: NoSQL Frankfurt: A Quick Review of the Conference (NoSQL databases © myNoSQL)


Hypertable 0.9.4.1 Minor Bug Fix Release

New Hypertable minor release to fix a bug in Hive extension. Complete change log ☞ here. Download ☞ here.

Original title and link: Hypertable 0.9.4.1 Minor Bug Fix Release (NoSQL databases © myNoSQL)


Hypertable 0.9.4.0 Released, Over 40 Improvements and Bug Fixes

Many improvements to garbage collection, a Hypertable monitoring web interface, upgraded Thrift and many more. The complete list of changes for Hypertable 0.9.4.0 can be found ☞ here.

I’ve embedded also a presentation by Doug Judd on Hypertable (nb: if you prefer videos you should check this great presentation: Hypertable: The Ultimate Scaling Machine

Original title and link: Hypertable 0.9.4.0 Released, Over 40 Improvements and Bug Fixes (NoSQL databases © myNoSQL)


Hypertable: The Ultimate Scaling Machine

Fantastic presentation by Doug Judd covering not only Hypertable but also other really scalable NoSQL databases:

Session was recorded at Berlin Buzzwords conference. Here is the list of my favorite presentations from the event.

Original title and link for this post: Hypertable: The Ultimate Scaling Machine (published on the NoSQL blog: myNoSQL)


Quick Dive into Hypertable Thrift API

I like the parallels with notions from the MySQL world:

[…] let’s take a look at high performance reading using Scanner. To those who are familiar with MySQL, the concept of using scanner is quite similar to the SSCursor. Instead of reading all the records into client side memory, there is a server-side cursor that’s “streaming” the result set to client side.

via: http://notes.alexdong.com/quick-introduction-to-hypertables-thrift-api


Who gives lowest read latency? Cassandra, HBase, Hypertable, or Voldemort?

Interesting question on Hacker News with good/informed comments so far.

I’ve got a great deal of information that I need to store in a key value format. I need access to that data as quickly as possible. Writes are only going to occur quarterly. Any thoughts?

via: http://news.ycombinator.com/item?id=1470521