Deploying Hadoop With Serengeti

Duncan Epping timing how long would take to deploy Hadoop with Serengeti:

How long did that take me? Indeed ~10 minutes

So, Project Serengeti is a sort of Apache Whirr for VMware vSphere.

VMWare Project Serengeti: Virtualization-Friendly Hadoop

VMWare Project Serengeti:

Serengeti is an open source project initiated by VMware to enable the rapid deployment of an Apache Hadoop cluster (HDFS, MapReduce, Pig, Hive, ..) on a virtual platform.

Serengeti 0.5 currently supports vSphere, with the ability to support other platforms. The project is at an early stage, and is endorsed by all major Hadoop distributions including Cloudera, Greenplum, Hortonworks and MapR.

The Hadoop wiki has a page dedicated to running Hadoop in a virtual environment. And there’s also the recent post by Steve Loughran about pros and cons of Hadoop in the cloud and a paper authored by VMWare about virtualizing Apache Hadoop (pdf).

How to run Redis natively on Xen

In this post we will investigate how Redis, a popular key-value storage, can be run natively on Xen, i.e., without the support of a conventional operating system such as Linux, and what implication this has on the performance.

When reading this first thought wast that we might be looking at a new era: bare metal virtualized databases (as in database running directly on a virtualized environment without an OS). But then I realized that except Redis, I don’t think any other NoSQL database would be able to run in this environment (in fact this experiment had to give up a couple of Redis features to make it work). So, dream over.

Membase on VMWare

James Phillips of NorthScale about scaling out with Membase on VMWare (real interview starts at around 1’35”):

Considering Membase is persisting to disk (as opposed to its little brother memcached which is memory only)[1], I’m wondering if virtualized environments provide good enough IO.

  1. As many other DBMS, Membase keeps “hot data” in memory, but it also writes it to disk for durability.  ()

CouchDB in a VirtualBox

Good for testing (but not the performance):

installing couchdb on ubuntu is a cinch: sudo apt-get install couchdb

As Stephan Schimdt posted today on Twitter:

Hurray for someone telling the truth: “Like most datastores, Riak will run best when not virtualized.” - V is NOT magically adding capacity