NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Cloud computing: All content tagged as Cloud computing in NoSQL databases and polyglot persistence

VMWare Cloud Foundry Storage Engines: MySQL, MongoDB, Redis

VMWare’s acquisitions at work:

The platform lets you build applications with Java and other JVM-based frameworks such as Grails and Roo, Rails and Sinatra for Ruby and Node.js. The platform plugs into application services such as RabbitMQ and GemFire, both now owned by VMware. […] Cloud Foundry also supports MySQL, MongoDB and Redis, […]

I assume other NoSQL databases will be added to the Cloud Foundry as I doubt Redis and MongoDB are the only ones operationally ready.

As a side note, I’m wondering if this announcement means VMWare is looking for its next acquisition in the direction of MongoDB makers’ 10gen.

Original title and link: VMWare Cloud Foundry Storage Engines: MySQL, MongoDB, Redis (NoSQL databases © myNoSQL)


NoSQL & Cloud at Netflix

Today Netflix can be seen as a leader in what can be achieved by combining cloud computing and polyglot persistence. Not only that, but Netflix has chosen to share their experience with everyone else so we can all learn from their experience.

Netflix’s experience of migrating from an on-premise architecture using relational databases has been documented over time. Here are a couple of important points in the history of migrating from the classical architecture to the mostly in the cloud solution they are currently using and continuing to experiment and build:

And it doesn’t stop here. In the video below, Siddharth “Sid” Anand covers the answers to some questions that are in the mind of everyone considering NoSQL databases in the cloud:

  • What sort of data can you move to NoSQL?
  • Which NoSQL technologies are we working with?
  • How did we translate RDBMS concepts to NoSQL?

Original title and link: NoSQL & Cloud at Netflix (NoSQL databases © myNoSQL)


SQL Server and SQL Azure Comparison

SQL Azure provides relational database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead.

If you are ready for the cloud — keep in mind this is not an easy question as proved by Netflix cloud migration and Reddit’s experience, going from on-premise SQL Server to SQL Azure doesn’t seem to involve drawbacks.

But what I’m really curious about is how SQL Azure compares to Amazon RDS.

Original title and link: SQL Server and SQL Azure Comparison (NoSQL databases © myNoSQL)


Reddit's Story of Running Cassandra & PostgreSQL on Amazon EBS

I’m still distilling what happened at Reddit the other days when failures of EBS in a single availability zone took Reddit down for many hours:

Unfortunately, EBS also has reliability issues. Even before the serious outage last night, we suffered random disks degrading multiple times a week. While we do have protections in place to mitigate latency on a small set of disks by using raid-0 stripes, the frequency of degradation has become highly unpalatable.

[…] we have been working to completely move Cassandra off of EBS and onto the local storage which is directly attached to the EC2 instances. […] While the local storage has much less functionality than EBS, the reliability of local storage outweighs the benefits of EBS.

After the outage today, we are going to be investigating doing the same for our Postgres clusters.

One mistake we made was using a single EBS disk to back some of our older master databases

Maybe these will sound as truisms to those working on high available systems, but not for everybody else:

  • when talking high availability, running your application from a single Amazon availability zone is not enough

  • even if EBS promises “highly available, highly reliable storage volumes”, a solution relying on it will have to account for: 1) failures; 2) unreliable performance.

    An ex-Reddit engineer posted details about the serious issues Reddit noticed while using Amazon EBS.

  • Dynamo-style NoSQL databases — where all nodes in a cluster are equal — are able to tolerate failures easier than traditional RDBMS.

    Reddit is working on moving Cassandra off the EBS and onto the local ephemeral EC2 storage.

  • A master/slave replication model combined with the out-of-order commits issue makes me think that the cloud and RDBMS are not yet perfect together.

    Data which had been committed to the slaves was not committed to the masters. In a normal replication scenario, this should never, ever happen. The master commits the data, then tells the slave it is safe to commit the same data.

  • One mistake we made was using a single EBS disk to back some of our older master databases

  • remember the Amazon EBS vs SSD: Price, Performance, QoS?

What else can we learn from Reddit’s experience?

Original title and link: Reddit’s Story of Running Cassandra & PostgreSQL on Amazon EBS (NoSQL databases © myNoSQL)


The Key Technical Challenge of Cloud Computing

Adrian Cockcroft[1]:

The key challenge is to get into the same mind-set as the Google’s of this world, the availability and robustness of your apps and services has to be designed into your software architecture, you have to assume that the hardware and underlying services are ephemeral, unreliable and may be broken or unavailable at any point, and that the other tenants in the multi-tenant public cloud will add random congestion and variance. In reality you always had this problem at scale, even with the most reliable hardware, so cloud ready architecture is about taking the patterns you have to use at large scale, and using them at a smaller scale to leverage the lowest cost infrastructure.

  1. Adrian Cockcroft: Cloud Architect at Netflix, @adrianco  

Original title and link: The Key Technical Challenge of Cloud Computing (NoSQL databases © myNoSQL)


Amazon EBS vs SSD: Price, Performance, QoS

Check the numbers and run your own tests. But their results are striking:

To summarize:

  • Server one in the datacenter is maybe a $10k machine with a $3000 disk array (say $4000 total per year plus colo costs, if you buy the server and rent a rack), responding to the database in generally sub-millisecond latencies, at a throughput of 30-40MB/s with quite a bit of headroom for more throughput.
  • Server two in the cloud costs about $17k to run per year, plus about $1500 per year in disk cost (up to $3000 per year now that they’ve added 10 more volumes), and is responding to the database in the tens and hundreds of milliseconds — highly variable from second to second and device to device — and causing horrible database pile-ups.
  • We’re comparing apples and oranges no matter what, but put simply, price is in the same order of magnitude, but performance is two to three orders of magnitude different.

Two thoughts bumped into my head after reading the post:

  1. what kind of virtualization is Joyent using to offer such consistent results for Riak’s benchmark?
  2. Joe Stump[1]: “I wouldn’t consider my startup not to use the cloud for all our applications”[2]

  1. Joe Stump: SimpleGeo founder, ex-Digg  

  2. This is a quotation from memory.  

Original title and link: Amazon EBS vs SSD: Price, Performance, QoS (NoSQL databases © myNoSQL)


Forrester report: SQL Azure Raises the Bar on Cloud Databases

Got the link to this Forrester report about SQL Azure (PDF) authored by Noel Yuhanna from the SQL Azure - The Year in Review:

Most customers stated that SQL Azure delivers a reliable cloud database platform to support various small to moderately sized applications as well as other data management requirements such as backup, disaster recovery, testing, and collaboration. Unlike other DBMS vendors such as IBM, Oracle, and Sybase that offer public cloud database largely using the Amazon Elastic Compute Cloud (Amazon EC2) platform, Microsoft SQL Azure is unique because of its multitenant architecture, which allows it to offer greater economies of scale and increased ease of use. […] Application developers and database administrators seeking a cloud database will find that SQL Azure offers a reliable and cost-effective platform to build and deploy small to moderately sized applications.

There are a couple of inconsistencies in the document, but the SQL Azure case studies section is worth reading.

Back to the fun part. In the pros section:

High availability at no extra effort or cost. […] In addition, SQL Azure automatically offers built-in server and storage redundancy, a data replication solution for built-in high availability, and transparent application failover to ensure minimal disruption.

The cons section:

Zero downtime availability. Although SQL Azure supports failover architecture should a database server fail, there is some downtime, ranging from a few seconds to minutes, associated with switching the application over to another server.

Back to pros:

Scale-out capacity growth via a sharded data platform. SQL Azure offers the ability to shard data into hundreds or even thousands of logical databases, which developers can use collectively for a given application.

and in the cons:

Automatic sharding of data for extreme scalability. SQL Azure does not automatically shard data into various partitions to scale across physical servers.

Just focus only on the three case studies included in the paper (PDF).

Markus ‘maol’ Perdrizat

Original title and link: Forrester report: SQL Azure Raises the Bar on Cloud Databases (NoSQL databases © myNoSQL)

Preliminary Comparison of and SQL Azure Features and Capabilities

Extensive comparison of the upcoming and Microsoft’s SQL Azure: will unbundle its underlying relational database engine from when the firm releases’s commercial version in 2011. In the meantime, developers can testdrive with a free developer account, which includes a database having:

  • Three enterprise user accounts
  • 100,000 rows of storage per month
  • 150,000 transactions per month

According to the article, will support ACID transactions (Apex code), triggers and stored procedures (Apex code), relationships, a query language, full-text search. Looks like a relational database in the cloud, but it doesn’t necessarily need to be underneath.

Original title and link: Preliminary Comparison of and SQL Azure Features and Capabilities (NoSQL databases © myNoSQL)


Cassandra on EC2: A Presentation

After taking a look at the possible models of running both SQL and NoSQL in the cloud and the very, very detailed guide for Hadoop on EC2, today is the time to check Dave Gardner’s slides deck[1] answering questions like:

  • why consider Amazon EC2 for Cassandra?
  • what are the challenges of running Cassandra on EC2?
  • Cassandra on EC2: good or bad idea?

  1. I’ve finally figure out a way to make these slides decks available on both normal browsers and non-Flash enabled mobile browsers (iOS, iPhone, iPad). Give it a try and let me know if it works for you.  ()

Original title and link: Cassandra on EC2: A Presentation (NoSQL databases © myNoSQL)

SQL and NoSQL In the Cloud

Options of running RDBMSs in the cloud:

  • Install and Manage – in this “traditional” model the developer or sysadmin selects their DBMS, creates instances in their cloud, installs it, and is then responsible for all administration tasks (backups, clustering, snapshots, tuning, and recovering from a disaster. […]
  • Use a Cloud-Managed DBaaS Instance – in this model the cloud provider offers a DBMS service that developers just use. All physical administration tasks (backup, recovery, log management, etc.) are performed by the cloud provider and the developer just needs to worry about structural tuning issues (indices, tables, query optimization, etc). […]
  • Use an External Cloud-Agnostic DBaaS Solution – this is very much like the cloud-based DBaaS, but has a value of cloud-independence – at least in theory. In the long run you might expect to be able to use an independent DBaaS to provide multi-cloud availability and continuous operations in the event of a cloud failure.

I guess these are equivalent to applying Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Database-as-a-Service (DaaS) (nb: this can be seen as a more specialized PaaS) models to persistency. And the same approach applies to NoSQL databases, as these models are orthogonal the the persistency problem.

RDBMS and NoSQL database in the cloud

Original title and link: SQL and NoSQL In the Cloud (NoSQL databases © myNoSQL)


Riak SmartMachine Benchmark: The Technical Details

Remember the Riak in the Joyent cloud benchmark? There’s a post providing many more details about the tests run:

The goal of the study was to demonstrate a baseline for users to understand Riak’s performance, stability, predictability, and linear scalability.  The systems were not tuned for optimal performance.  Instead, we chose to take standard 4 GB Riak Smartmachines and demonstrate throughput and latency for various access patterns and object sizes.

The conclusions is what made me say it got atypical (in a good sense) results:

Our benchmark tests bring us to the following conclusions:

  • Riak behaves predictably under high loads – depending on system resources, Riak exhibits either predictable, steady-state throughput with low errors or degrades gracefully with low errors.
  • Riak demonstrates stability under high loads – very few errors, no node failures under load, and behavior in line with expectations.
  • Riak demonstrates linear scalability – adding or removing capacity adds or subtracts a predictable amount of capacity from the cluster.

Original title and link: Riak SmartMachine Benchmark: The Technical Details (NoSQL databases © myNoSQL)


MongoDB on Amazon EC2 with EBS Volumes

In general a stripped EBS volume will improve the iowaits. However, in our own tests we see irregular latency spikes during the day from any given EBS backed device, sometimes peaking at 600ms.

We don’t have a good explanation what happening on the EC2, but we will continue to dig.

That’s one of the reasons I’ve said the Riak on Joyent benchmark is really impressive.

Original title and link: MongoDB on Amazon EC2 with EBS Volumes (NoSQL databases © myNoSQL)