NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Reports from NoSQL Live in Boston

In case you haven’t been able to make it to the NoSQL Live in Boston event and you don’t have the patience for the videos to come out, I have found a couple of reports from the event.

From the the ☞ End Point’s Blog:

I went in feeling convinced of the desirability of non-relational datastores for specific modeling situations (graphs) and for scalability/availability/volume concerns (Dynamo and BigTable derivatives), while feeling relatively skeptical of “document datastores”. I left feeling basically the same way, though decidedly less skeptical of CouchDB than I previously was.

And then on a ☞ follow up post:

The simplicity of the pure key/value store (Voldemort and Riak are more like this) brings flexibility in what you represent; having a somewhat more structured data model with which to work (as in Cassandra) can add some complexity to how you design your data, but brings improved flexibility in how you can navigate that data. (my note: very interesting remark)


[…] one might get the impression that Cassandra has the broadest range of interesting deployments, Voldemort has fewer but is still interesting (Linkedin is certainly no slouch), and Riak has nothing to point to outside Basho Technologies’ non-free Enterprise variant.

Last, but not least, by looking at what happened in the last couple of weeks, it looks like myNoSQL post on Cassandra @ Twitter has made quite some waves:

Of the three projects mentioned, Cassandra clearly has the “momentum” (a highly accurate indicator of future dominance).

Adam Marcus posted ☞ a long blog that summarizes most of the talks and panels. As you’d expect the most interesting discussions seems to have happened on the panels: “Scaling with NoSQL” (between memcached, Voldemort, Hypertable, Cassandra, HBase), “Schema design and document-oriented DBs” (CouchDB, MongoDB, Riak), and “Evolution of a Graph Data structure from research to production” (HypergraphDB, Neo4j, W3C RDF).

Some cool things covered on the Scaling with NoSQL panel:

  • what’s life for operations folks?
    • Voldemort: little babysitting
    • Cassandra: the engineering team is the operations
    • Hypertable: easy to deploy, but harder to get HDFS right
    • HBase: config changes require rsynching configs to all machines which is doesn’t scale well. Twitter, Ryan King suggests capistrano
  • use cases/deployments in the wild
  • random bits:
    • HDFS not designed for lots of random reads
    • Hypertable vs. HBase: Judd says c++ makes for more efficient memory and cpu footprint. (note this sounds as a quite old argument)
    • Voldemort is persistent key-value store, whereas memcache is not persistent
    • BigTable folks point out that range scans suck in all other systems. Automatic partitioning (at least in Cassandra) needs some love as well

Topics covered on the Schema design and document-oriented DBs panel:

  • indexing
  • foreign keys and relationships
  • schemas/migrations (?)
  • horizontal partitioning (note interesting to notice that neither MongoDB nor CouchDB do have anything working out of the box)
  • consistency

I had the chance to watch myself the Evolution of a Graph Data structure from research to production panel which was very interesting and covered subjects like:

  • query model
  • implementation details
  • support for schemas (for transfer of knowledge inside your team)
  • use cases/live deployments

For a per project personal overview of the event, you could check Brian R.Jackson’s ☞ post, covering Cassandra, Memcached, Tokyo Cabinet, Hypertable, HBase.

I hope the videos will get out pretty soon so you’ll have a chance to watch them yourself.