NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Hadoop and HBase Status Updates after Hadoop Summit

As you can expect after such a large summit, there are tons of updates coming in.

For now I’ve selected two, but if you find others as interesting please share them with us.

James Hamilton using a colleague’s report:

Key Takeaways

  • Yahoo and Facebook operate the world largest Hadoop clusters, 4,000/2,300 nodes with 70/40 petabytes respectively. They run full cluster replicas to assure availability and data durability.
  • Yahoo released Hadoop security features with Kerberos integration which is most useful for long running multitenant Hadoop clusters.
  • Cloudera released paid enterprise version of Hadoop with cluster management tools and several dB connectors and announced support for Hadoop security.
  • Amazon Elastic MapReduce announced expand/shrink cluster functionality and paid support.
  • Many Hadoop users use the service in conjunction with NoSQL DBs like Hbase or Cassandra.

James Hamilton

Tim Sells has an extensive report on HBase status:

The next version will be 0.90. It will be a reliability release, but also includes performance gains. The version change will break from hadoop version numbers. 0.90 was chosen as there’s a belief it is maturing towards a 1.0 release.

The main points I picked up are:

  • New batch importing allows writing hfiles directly and then just telling hbase where they are.
  • Taking advantage of appends in hdfs for genuine durability.
  • The namenode single point of failure is being addressed, facebook is planning to release their HA namenode.
  • Replication between clusters. Allows cross data center replication. Eventually consistent.
  • Tighter integration with zookeeper through a master rewrite.
  • Significant work to have less temperamental behaviour during compaction and splits.
  • Facebook are planning to release their distribution of hadoop and their highly available namenode.

Tim Sells