NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



vmware: All content tagged as vmware in NoSQL databases and polyglot persistence

Hadoop Virtualization

Roberto V. Zicari interviewing Joe Russell1 about Hadoop virtualization with Serengeti:

A common misconception when virtualizing Hadoop clusters is that we decouple the data nodes from the physical infrastructure. This is not necessarily true. When users virtualize a Hadoop cluster using Project Serengeti, they separate data from compute while preserving data locality. By preserving data locality, we ensure that performance isn’t negatively impacted, or essentially making the infrastructure appear as static. Additionally, it creates true multi-tenancy within more layers of the Hadoop stack, not just the name node.

I’m not 100% sure I get this, but the way I explained it to myself to actually make sense this would mean that HDFS lives directly on the physical hardware and only the compute part is virtualized. Is that what he means?

  1. Joe Russell is Product Line Marketing Manager at VMware. 

Original title and link: Hadoop Virtualization (NoSQL database©myNoSQL)


Traditional, NoSQL and NewSQL Are All Broken. All Data in Memory

Stancey Schneider for VMware:

Over the past few years, memory has gotten cheap and is easily commoditized in the cloud. So moving your data strategy to put it all in-memory just plain makes sense. It eliminates an extra hop to read and write data from disk, making it inherently faster and the performance more consistent. It also manages to simplify the internal optimization algorithms and reduce the number of instructions to the CPU making better use of the hardware.

This is the “conclusion” after “establishing” in the post that:

  1. traditional databases are already broken because of the fixed schemas and data being persisted on disk
  2. NoSQL databases are also broken because even if they have flexible schemas, data is still persisted on disk and “replication takes time to do all the read and writes”
  3. NewSQL are also broken because “the way the databases handles the data distribution makes it so there NewSQL databases do not scale linearly”

All this FUD just to promote GemFire and SQLFire? I really thought VMware is a serious company.

Original title and link: Traditional, NoSQL and NewSQL Are All Broken. All Data in Memory (NoSQL database©myNoSQL)


VMware Sues Hortonworks

Stay calm. Hadoop is safe.

The Register:

VMware has taken Hortonworks to court along with four ex-VMers who now work at the startup - and among them is VMWare’s former global sales chief.

Original title and link: VMware Sues Hortonworks (NoSQL database©myNoSQL)


Main Features of In-Memory Data Grids

Good article about In-Memory Data Grids on Cubrid’s blog by Ki Sun Song.

The features of IMDG can be summarized as follows:

  • Data is distributed and stored in multiple servers.
  • Each server operates in the active mode.
  • A data model is usually object-oriented (serialized) and non-relational.
  • According to the necessity, you often need to add or reduce servers.

Even if you don’t read it all, but plan to use an IMDG solution, the first two questions you want to ask your vendor are: what the approach you are proposing to deal with the limited memory capacity and what’s the strategy for reliability. You’ll get good answers from well established products, but these answers are not necessarily the ones that provide the exact requirements your solution will need.

Original title and link: Main Features of In-Memory Data Grids (NoSQL database©myNoSQL)


Deploying Hadoop With Serengeti

Duncan Epping timing how long would take to deploy Hadoop with Serengeti:

How long did that take me? Indeed ~10 minutes

So, Project Serengeti is a sort of Apache Whirr for VMware vSphere.

Original title and link: Deploying Hadoop With Serengeti (NoSQL database©myNoSQL)


Why Virtualize Hadoop and How Project Serengeti Can Help

A very long post by Richard McDougall explaining why virtualizing Hadoop may make sense and how VMware’s Project Serengeti can help. Answering the question in the title, McDougall enumerates 6 reasons:

  1. Consolidation/sharing of a big-data platform
  2. Rapid provisioning
  3. Resource sharing
  4. High availability
  5. Security
  6. Versioned Hadoop environments

He’s also addressing two of the most common questions about Hadoop virtualization:

  1. Isn’t there a large performance overhead? Benchmark results are available in a whitepaper that can be read or downloaded below.
  2. Doesn’t vSphere use shared SAN storage only? (nb: the short answer is that vSphere supports both local and shared storage)

Project Serengeti


VMWare Project Serengeti: Virtualization-Friendly Hadoop

VMWare Project Serengeti:

Serengeti is an open source project initiated by VMware to enable the rapid deployment of an Apache Hadoop cluster (HDFS, MapReduce, Pig, Hive, ..) on a virtual platform.

Serengeti 0.5 currently supports vSphere, with the ability to support other platforms. The project is at an early stage, and is endorsed by all major Hadoop distributions including Cloudera, Greenplum, Hortonworks and MapR.

The Hadoop wiki has a page dedicated to running Hadoop in a virtual environment. And there’s also the recent post by Steve Loughran about pros and cons of Hadoop in the cloud and a paper authored by VMWare about virtualizing Apache Hadoop (pdf).

Original title and link: VMWare Project Serengeti: Virtualization-Friendly Hadoop (NoSQL database©myNoSQL)

EMC Contributes 1000+ Nodes Cluster for Apache Hadoop Testing

The Greenplum Analytics Workbench incorporates technology from the world’s leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. The test bed cluster, which consists of 1,000+ hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.


Original title and link: EMC Contributes 1000+ Nodes Cluster for Apache Hadoop Testing (NoSQL database©myNoSQL)


Sinatra with Redis on Cloud Foundry

The workshop takes you through creating a Sinatra application using sample code from here . Once the Sinatra application which leverages Twitter is working, the workshop then takes you through adding Redis to your application. Finally the workshop ends after taking you through scaling your application instances up and then back down.

Only 15 minutes to get it up and running:

Original title and link: Sinatra with Redis on Cloud Foundry (NoSQL databases © myNoSQL)

Cloud Foundry, NoSQL Databases, and Polyglot Persistence

VMWare’s Cloud Foundry has the potential to become the preferred PaaS solution. It bundles together a set of services that it took years for other PaaS providers (Google App Engine, Microsoft Azure) to offer. And it seems that Cloud Foundry has much less (or none at all) vendor lock in[1].

From a storage perspective, Cloud Foundry is encouraging polyglot persistence right from the start offering access to a relational database (MySQL), a super-fast smart key-value store (Redis), and a popular document database (MongoDB). The only bit missing is a graph database[2].

I think the first graph database to get there will see an immediate bump in its adoption.

  1. These comments are based on what I’ve read about VMWare CloudFoundry as I haven’t received (yet) my invitation.  

  2. I don’t think wide-column databases (Cassandra, HBase) are fit for PaaS  

Original title and link: Cloud Foundry, NoSQL Databases, and Polyglot Persistence (NoSQL databases © myNoSQL)

VMWare Cloud Foundry Storage Engines: MySQL, MongoDB, Redis

VMWare’s acquisitions at work:

The platform lets you build applications with Java and other JVM-based frameworks such as Grails and Roo, Rails and Sinatra for Ruby and Node.js. The platform plugs into application services such as RabbitMQ and GemFire, both now owned by VMware. […] Cloud Foundry also supports MySQL, MongoDB and Redis, […]

I assume other NoSQL databases will be added to the Cloud Foundry as I doubt Redis and MongoDB are the only ones operationally ready.

As a side note, I’m wondering if this announcement means VMWare is looking for its next acquisition in the direction of MongoDB makers’ 10gen.

Original title and link: VMWare Cloud Foundry Storage Engines: MySQL, MongoDB, Redis (NoSQL databases © myNoSQL)


Quick Start Hadoop Bundle on Amazon Web Services by Karmasphere

  • Karmasphere is bundling its Studio Professional Edition and Analyst products with Amazon Web Services credits. The bundle is packaged in a virtual machine for use on Linux/UNIX, Windows and MacOS workstations using VMware players.
  • The bundle features “one-button” deployment of Apache Hadoop and Hive applications to AWS and includes 30-day evaluation licenses of Karmasphere commercial products

Is there some sort of competition between Karmasphere and Cloudera? I haven’t heard much about Karmasphere and that only means that these two Hadoop ecosystem providers are using different market strategies.

Original title and link: Quick Start Hadoop Bundle on Amazon Web Services by Karmasphere (NoSQL databases © myNoSQL)